Document Loaders
Parsers and connectors for ingesting documents and data into LLM pipelines
13 tools
AWS Textract
PaidManaged AWS service that automatically extracts text, handwriting, tables, and form fields from scanned documents and PD...
Document LoadersAzure Document Intelligence
FreemiumMicrosoft's cloud service for intelligent document processing — extracts structured data from forms, invoices, receipts,...
Document LoadersCamelot
Open SourcePython library that extracts tables from text-based PDF files into structured data formats. Offers two parsing methods (...
Document LoadersCrawl4AI
Open SourceOpen-source async web crawler that extracts clean markdown and structured data from web pages for LLM consumption. Diffe...
Document LoadersDocling
Open SourceOpen-source document parsing toolkit by IBM that converts PDFs, DOCX, PPTX, and images into structured JSON or Markdown ...
Document LoadersGoogle Document AI
FreemiumGoogle Cloud's document understanding platform with pre-trained processors for contracts, invoices, identity documents, ...
Document LoadersLlamaParse
FreemiumProprietary document parser optimized for LLM ingestion with superior handling of complex PDFs, tables, and mixed media ...
Document LoadersMarker
Open SourceFast open-source PDF-to-Markdown converter that outperforms Nougat on most documents, supporting OCR, complex layouts, e...
Document LoadersPyMuPDF
Open SourceHigh-performance Python binding for MuPDF — a lightweight PDF and XPS viewer — enabling fast text extraction, page rende...
Document LoadersReducto
FreemiumAPI-first document parsing service designed for production RAG pipelines, offering high-accuracy extraction from complex...
Document LoadersSpider
FreemiumWeb scraping and crawling API that extracts structured data from any URL for AI agents, RAG pipelines, and LLMs. Differe...
Document LoadersUnstructured
FreemiumETL platform that ingests unstructured documents (PDFs, images, HTML, etc.) and transforms them into clean, structured d...
Document LoadersZerox
Open SourceOpen-source PDF OCR tool that uses vision-capable LLMs (GPT-4o, Claude) to convert any PDF page into clean Markdown via ...
Document Loaders