Document Processing
with Context Recovery

Smart section-based processing with full context retrieval

System Architecture

System Architecture Diagram

Section-Based Processing

Documents are intelligently split into meaningful sections while preserving hierarchy.

Context Recovery

Retrieve the full original section content even when searching small chunks.

Hybrid Search

Combines vector similarity and BM25 for precise and relevant results.

Workflow Demonstration

Process Steps

Document Upload

Users upload documents through the API endpoint:

                                    
POST /index-file/
{
  "list_path": [
    {
      "id": "uuid-file-1",
      "path": "extracted/document.md"
    }
  ]
}
                                    
                                

The system creates a unique job ID for processing and returns immediately.

System Benefits

Full Context Recovery

Retrieve complete section content even when searching small chunks, ensuring no context is lost.

Hybrid Search

Combine semantic vector search with keyword-based BM25 for more accurate results.

Optimal Performance

Fast vector search on small chunks with instant retrieval of full sections from PostgreSQL.