This n8n workflow converts various file formats (.pdf, .doc, .png, .jpg, .webp) to clean markdown text using the datalab.to API. Perfect for AI agents, LLM processing, and RAG (Retrieval Augmented Generation) data preparation for vector databases.
Workflow Description
Input
- Trigger Node: Form trigger or webhook to accept file uploads
- Supported Formats: PDF documents, Word documents (.doc/.docx), and images (PNG, JPG, WEBP)
Processing Steps
- File Validation: Check file type and size constraints
- HTTP Request Node:
- Method: POST to
https://api.datalab.to/v1/marker
- Headers:
X-API-Key
with your datalab.to API key
- Body: Multipart form data with the file
- Response Processing: Extract the converted markdown text
- Output Formatting: Clean and structure the markdown for downstream use
Output
- Clean, structured markdown text ready for:
- LLM prompt injection
- Vector database ingestion
- AI agent knowledge base processing
- Document analysis workflows
Setup Instructions
- Get API Access: Sign up at datalab.to to obtain your API key
- Configure Credentials:
- Create a new credential in n8n
- Add Generic Header:
X-API-Key
with your API key as the value
- Import Workflow: Ready to process files immediately
Use Cases
- AI Workflows: Convert documents for LLM analysis and processing
- RAG Systems: Prepare clean text for vector database ingestion
- Content Management: Batch convert files to searchable markdown format
- Document Processing: Extract text from mixed file types in automated pipelines
The workflow handles the complexity of different file formats while delivering consistent, AI-ready markdown output for your automation needs.