babylon.rag.pre_embeddings.manager

Pre-embeddings management for the RAG system.

This module provides the PreEmbeddingsManager which integrates preprocessing, chunking, and caching components to prepare content for embedding.

Classes

PreEmbeddingsConfig(**data)

Configuration for the pre-embeddings system.

PreEmbeddingsManager([config, preprocessor, ...])

Manages the pre-embeddings pipeline for the RAG system.

class babylon.rag.pre_embeddings.manager.PreEmbeddingsConfig(**data)[source]

Bases: BaseModel

Configuration for the pre-embeddings system.

Parameters:
preprocessing_config

Configuration for content preprocessing

chunking_config

Configuration for content chunking

cache_config

Configuration for embedding cache management

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

preprocessing_config: PreprocessingConfig | None
chunking_config: ChunkingConfig | None
cache_config: CacheConfig | None
class babylon.rag.pre_embeddings.manager.PreEmbeddingsManager(config=None, preprocessor=None, chunker=None, cache_manager=None, lifecycle_manager=None)[source]

Bases: object

Manages the pre-embeddings pipeline for the RAG system.

This class integrates preprocessing, chunking, and caching components to prepare content for embedding generation.

Parameters:
__init__(config=None, preprocessor=None, chunker=None, cache_manager=None, lifecycle_manager=None)[source]

Initialize with configuration and optional component instances.

Parameters:
process_content(content)[source]

Process a single content item through the pre-embeddings pipeline.

Parameters:

content (str) – Raw content to process

Return type:

list[dict[str, Any]]

Returns:

List of processed chunks with metadata

Raises:

PreEmbeddingError – If processing fails at any stage

process_batch(contents)[source]

Process multiple content items efficiently.

Parameters:

contents (list[str]) – List of raw content items to process

Return type:

list[list[dict[str, Any]]]

Returns:

List of lists of processed chunks with metadata

Raises:

PreEmbeddingError – If batch processing fails

prepare_for_embedding(obj)[source]

Prepare an object for embedding by processing its content.

This method is designed to work with objects that follow the Embeddable protocol from the embedding system.

Parameters:

obj (Any) – Object with content to prepare for embedding

Return type:

dict[str, Any]

Returns:

Dictionary with processed content and metadata

Raises:

PreEmbeddingError – If preparation fails

prepare_batch_for_embedding(objects)[source]

Prepare multiple objects for embedding by processing their content.

Parameters:

objects (list[Any]) – List of objects with content to prepare

Return type:

list[dict[str, Any]]

Returns:

List of dictionaries with processed content and metadata

Raises:

PreEmbeddingError – If batch preparation fails

get_stats()[source]

Get statistics about the pre-embeddings system.

Return type:

dict[str, Any]

Returns:

Dictionary of statistics