babylon.rag.pre_embeddings.chunking
Content chunking for the RAG system.
This module provides functionality for dividing content into appropriate chunks before embedding generation.
Classes
|
Configuration for content chunking. |
|
Divides content into appropriate chunks for embedding. |
- class babylon.rag.pre_embeddings.chunking.ChunkingConfig(**data)[source]
Bases:
BaseModelConfiguration for content chunking.
- Parameters:
- strategy
Chunking strategy to use (“fixed” or “semantic”)
- chunk_size
Size of chunks in characters (for fixed strategy)
- overlap
Number of characters to overlap between chunks (for fixed strategy)
- delimiter
Delimiter to use for semantic chunking
- min_chunk_size
Minimum allowed chunk size
- max_chunk_size
Maximum allowed chunk size
- model_config: ClassVar[ConfigDict] = {'frozen': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class babylon.rag.pre_embeddings.chunking.ChunkingStrategy(config=None)[source]
Bases:
objectDivides content into appropriate chunks for embedding.
This class handles different chunking strategies including fixed-size chunking and semantic chunking based on content structure.
- Parameters:
config (ChunkingConfig | None)
- __init__(config=None)[source]
Initialize with configuration options.
- Parameters:
config (
ChunkingConfig|None) – Configuration for chunking behavior