babylon.rag.pre_embeddings
Pre-embedding pipeline for RAG document processing.
This package handles the preprocessing stages before documents are embedded into the ChromaDB vector store:
- Modules:
cache_manager: Manages embedding cache to avoid recomputation. chunking: Splits documents into semantically coherent chunks
optimized for retrieval and context window limits.
manager: Orchestrates the pre-embedding pipeline workflow. preprocessor: Text cleaning, normalization, and preparation
for the embedding model.
- The pipeline flow is:
Preprocessor cleans raw text
Chunker splits into retrieval-optimized segments
Cache manager checks for existing embeddings
Manager coordinates the full workflow
Modules
Embedding cache management for the RAG system. |
|
Content chunking for the RAG system. |
|
Pre-embeddings management for the RAG system. |
|
Content preprocessing for the RAG system. |