DocuSense Explorer - Document Processing System

Document Upload

Users upload documents through the API endpoint:

                                    
POST /index-file/
{
  "list_path": [
    {
      "id": "uuid-file-1",
      "path": "extracted/document.md"
    }
  ]
}

The system creates a unique job ID for processing and returns immediately.

Section Splitting

Documents are split into hierarchical sections:

# Main Title

## Section 1

### Subsection 1.1

## Section 2

Each section maintains its page information and document context.

Chunk Generation

Large sections are split into optimally-sized chunks:

Section ID: 123e4567

Chunk 1 (Tokens: 512)

Chunk 2 (Tokens: 498)

Each chunk references its parent section while being small enough for efficient vector search.

Storage

PostgreSQL

Stores complete section content with metadata

Milvus

Stores vector embeddings of chunks with section references

Search Process

1

Query Processing

Convert search query to embeddings and search Milvus
2

Result Aggregation

Group chunks by their section_id and score
3

Context Retrieval

Fetch full section content from PostgreSQL

Document Processing
with Context Recovery

System Architecture

Section-Based Processing

Context Recovery

Hybrid Search

Workflow Demonstration

Process Steps

Document Upload

Section Splitting

Chunk Generation

Storage

PostgreSQL

Milvus

Search Process

System Benefits

Full Context Recovery

Hybrid Search

Optimal Performance

Document Processing with Context Recovery

System Architecture

Section-Based Processing

Context Recovery

Hybrid Search

Workflow Demonstration

Process Steps

Document Upload

Section Splitting

Chunk Generation

Storage

PostgreSQL

Milvus

Search Process

System Benefits

Full Context Recovery

Hybrid Search

Optimal Performance

Document Processing
with Context Recovery