Data Pipelines
ETL and data orchestration tools for feeding AI systems
18 tools
Airbyte
FreemiumOpen-source data integration platform with 300+ connectors for syncing data from any source to any destination. Used in ...
Data PipelinesApache Airflow
Open SourceOpen-source platform to programmatically author, schedule, and monitor workflows using Python DAGs. Differentiates with ...
Data PipelinesApache Kafka
Open SourceOpen-source distributed event streaming platform for high-performance data pipelines, streaming analytics, and data inte...
Data PipelinesCrawlee
Open SourceWeb scraping and browser automation library. Handles anti-bot protections. TypeScript and Python.
Data PipelinesDagster
FreemiumData orchestration platform built around the concept of software-defined assets. Designed for ML and analytics teams, it...
Data Pipelinesdbt (data build tool)
FreemiumSQL-first transformation tool that enables data analysts and engineers to transform, test, and document data using modul...
Data PipelinesEmbedchain
Open SourceRAG framework by Mem0. Create AI apps over any data in minutes. Supports 20+ data source types.
Data PipelinesFlyte
Open SourceOpen-source workflow orchestration platform for building and scaling AI/ML pipelines and data workflows. Differentiates ...
Data PipelinesJina Reader
FreemiumConvert any URL to LLM-friendly text. Simple API: prefix URL with r.jina.ai. Free tier available.
Data PipelinesKedro
Open SourceOpen-source Python framework for creating reproducible, maintainable, and modular data science and ML pipelines. Develop...
Data PipelinesKubeflow
Open SourceOpen-source ML platform on Kubernetes. Provides Pipelines (DAG orchestration), Notebooks, Model Training Operator, KServ...
Data PipelinesLlamaParse
FreemiumLlamaIndex's document parser. Handles complex PDFs with tables, charts, and mixed layouts for RAG.
Data PipelinesMage AI
FreemiumOpen-source data pipeline tool built for ML engineers with a block-based notebook interface. Designed to make building, ...
Data PipelinesMegaParse
Open SourceUniversal document parser. Supports PDF, DOCX, PPTX, and more. Integrates with LangChain and LlamaIndex.
Data PipelinesMetaflow
Open SourcePython framework for building and managing ML, AI, and data science workflows with built-in versioning and orchestration...
Data PipelinesPrefect
FreemiumPython-native workflow orchestration platform that turns functions into observable workflows using decorators. Different...
Data PipelinesR2R (SciPhi)
Open SourceProduction-ready RAG engine. Ingestion, search, and generation in one system with knowledge graph support.
Data PipelinesZenML
FreemiumOpen-source MLOps framework for building portable, production-ready ML pipelines. Abstracts infrastructure complexity so...
Data Pipelines