babylon.rag.pre_embeddings.chunking

Content chunking for the RAG system.

This module provides functionality for dividing content into appropriate chunks before embedding generation.

Classes

ChunkingConfig(**data)

Configuration for content chunking.

ChunkingStrategy([config])

Divides content into appropriate chunks for embedding.

class babylon.rag.pre_embeddings.chunking.ChunkingConfig(**data)[source]

Bases: BaseModel

Configuration for content chunking.

Parameters:
  • strategy (str)

  • chunk_size (int)

  • overlap (int)

  • delimiter (str)

  • min_chunk_size (int)

  • max_chunk_size (int)

strategy

Chunking strategy to use (“fixed” or “semantic”)

chunk_size

Size of chunks in characters (for fixed strategy)

overlap

Number of characters to overlap between chunks (for fixed strategy)

delimiter

Delimiter to use for semantic chunking

min_chunk_size

Minimum allowed chunk size

max_chunk_size

Maximum allowed chunk size

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: str
chunk_size: int
overlap: int
delimiter: str
min_chunk_size: int
max_chunk_size: int
class babylon.rag.pre_embeddings.chunking.ChunkingStrategy(config=None)[source]

Bases: object

Divides content into appropriate chunks for embedding.

This class handles different chunking strategies including fixed-size chunking and semantic chunking based on content structure.

Parameters:

config (ChunkingConfig | None)

__init__(config=None)[source]

Initialize with configuration options.

Parameters:

config (ChunkingConfig | None) – Configuration for chunking behavior

chunk(content)[source]

Divide content into chunks based on configured strategy.

Parameters:

content (str) – Content to chunk

Return type:

list[str]

Returns:

List of content chunks

Raises:

ChunkingError – If chunking fails or content is invalid

chunk_batch(contents)[source]

Process multiple content items efficiently.

Parameters:

contents (list[str]) – List of content items to chunk

Return type:

list[list[str]]

Returns:

List of lists of chunks, one list per content item