Document Processing
with Context Recovery
Smart section-based processing with full context retrieval
System Architecture
Section-Based Processing
Documents are intelligently split into meaningful sections while preserving hierarchy.
Context Recovery
Retrieve the full original section content even when searching small chunks.
Hybrid Search
Combines vector similarity and BM25 for precise and relevant results.
Workflow Demonstration
Process Steps
Document Upload
Users upload documents through the API endpoint:
POST /index-file/
{
"list_path": [
{
"id": "uuid-file-1",
"path": "extracted/document.md"
}
]
}
The system creates a unique job ID for processing and returns immediately.
System Benefits
Full Context Recovery
Retrieve complete section content even when searching small chunks, ensuring no context is lost.
Hybrid Search
Combine semantic vector search with keyword-based BM25 for more accurate results.
Optimal Performance
Fast vector search on small chunks with instant retrieval of full sections from PostgreSQL.