babylon.rag.embeddings

Embedding management for the RAG system.

Supports both local (Ollama) and cloud (OpenAI) embedding providers. Default: Ollama with embeddinggemma for fully offline operation.

Classes

Embeddable(*args, **kwargs)

Protocol for objects that can be embedded.

EmbeddingManager([embedding_dimension, ...])

Manages embeddings for RAG objects.

class babylon.rag.embeddings.Embeddable(*args, **kwargs)[source]

Bases: Protocol

Protocol for objects that can be embedded.

id: str
content: str
embedding: list[float] | None
__init__(*args, **kwargs)
class babylon.rag.embeddings.EmbeddingManager(embedding_dimension=None, batch_size=None, max_cache_size=1000, max_concurrent_requests=4)[source]

Bases: object

Manages embeddings for RAG objects.

The EmbeddingManager handles: - Generating embeddings via Ollama (local) or OpenAI (cloud) API - Caching embeddings for reuse with LRU eviction - Rate limiting and retry logic - Batch operations for efficiency - Error handling and recovery - Performance metrics collection - Concurrent operations

Default: Ollama with embeddinggemma for fully offline operation.

Parameters:
  • embedding_dimension (int | None)

  • batch_size (int | None)

  • max_cache_size (int)

  • max_concurrent_requests (int)

__init__(embedding_dimension=None, batch_size=None, max_cache_size=1000, max_concurrent_requests=4)[source]

Initialize the embedding manager.

Parameters:
  • embedding_dimension (int | None) – Size of embedding vectors (default: from LLMConfig)

  • batch_size (int | None) – Number of objects to embed in each batch (default: from LLMConfig)

  • max_cache_size (int) – Maximum number of embeddings to keep in cache (default: 1000)

  • max_concurrent_requests (int) – Maximum number of concurrent embedding requests (default: 4)

Raises:

ValueError – If embedding configuration is invalid

property cache_size: int

Get current number of embeddings in cache.

async aembed(obj)[source]

Asynchronously generate and attach embedding for a single object.

Parameters:

obj (TypeVar(E, bound= Embeddable)) – Object to embed

Return type:

TypeVar(E, bound= Embeddable)

Returns:

Object with embedding attached

Raises:
embed(obj)[source]

Synchronously generate and attach embedding for a single object.

This is a convenience wrapper around aembed for synchronous code. For better performance in async contexts, use aembed directly.

Parameters:

obj (TypeVar(E, bound= Embeddable)) – Object to embed

Return type:

TypeVar(E, bound= Embeddable)

Returns:

Object with embedding attached

Raises:
async aembed_batch(objects)[source]

Asynchronously generate embeddings for multiple objects efficiently.

Parameters:

objects (Sequence[TypeVar(E, bound= Embeddable)]) – List of objects to embed

Return type:

list[TypeVar(E, bound= Embeddable)]

Returns:

List of objects with embeddings attached

Raises:

EmbeddingError – If any object’s embedding generation fails

embed_batch(objects)[source]

Synchronously generate embeddings for multiple objects efficiently.

This is a convenience wrapper around aembed_batch for synchronous code. For better performance in async contexts, use aembed_batch directly.

Parameters:

objects (Sequence[TypeVar(E, bound= Embeddable)]) – List of objects to embed

Return type:

list[TypeVar(E, bound= Embeddable)]

Returns:

List of objects with embeddings attached

Raises:

EmbeddingError – If any object’s embedding generation fails

debed(obj)[source]

Remove embedding from an object.

Parameters:

obj (TypeVar(E, bound= Embeddable)) – Object to remove embedding from

Return type:

TypeVar(E, bound= Embeddable)

Returns:

Object with embedding removed

debed_batch(objects)[source]

Remove embeddings from multiple objects.

Parameters:

objects (Sequence[TypeVar(E, bound= Embeddable)]) – List of objects to remove embeddings from

Return type:

list[TypeVar(E, bound= Embeddable)]

Returns:

List of objects with embeddings removed

async close()[source]

Close resources used by the embedding manager.

Return type:

None