Aidena

Document Loaders

Parsers and connectors for ingesting documents and data into LLM pipelines

13 tools

AWS Textract

Paid

Managed AWS service that automatically extracts text, handwriting, tables, and form fields from scanned documents and PD...

Document Loaders

Azure Document Intelligence

Freemium

Microsoft's cloud service for intelligent document processing — extracts structured data from forms, invoices, receipts,...

Document Loaders

Camelot

Open Source

Python library that extracts tables from text-based PDF files into structured data formats. Offers two parsing methods (...

Document Loaders

Crawl4AI

Open Source

Open-source async web crawler that extracts clean markdown and structured data from web pages for LLM consumption. Diffe...

Document Loaders

Docling

Open Source

Open-source document parsing toolkit by IBM that converts PDFs, DOCX, PPTX, and images into structured JSON or Markdown ...

Document Loaders

Google Document AI

Freemium

Google Cloud's document understanding platform with pre-trained processors for contracts, invoices, identity documents, ...

Document Loaders

LlamaParse

Freemium

Proprietary document parser optimized for LLM ingestion with superior handling of complex PDFs, tables, and mixed media ...

Document Loaders

Marker

Open Source

Fast open-source PDF-to-Markdown converter that outperforms Nougat on most documents, supporting OCR, complex layouts, e...

Document Loaders

PyMuPDF

Open Source

High-performance Python binding for MuPDF — a lightweight PDF and XPS viewer — enabling fast text extraction, page rende...

Document Loaders

Reducto

Freemium

API-first document parsing service designed for production RAG pipelines, offering high-accuracy extraction from complex...

Document Loaders

Spider

Freemium

Web scraping and crawling API that extracts structured data from any URL for AI agents, RAG pipelines, and LLMs. Differe...

Document Loaders

Unstructured

Freemium

ETL platform that ingests unstructured documents (PDFs, images, HTML, etc.) and transforms them into clean, structured d...

Document Loaders

Zerox

Open Source

Open-source PDF OCR tool that uses vision-capable LLMs (GPT-4o, Claude) to convert any PDF page into clean Markdown via ...

Document Loaders