babylon.rag.context_window.manager

Context window manager for token-aware content prioritization.

This module implements dynamic context window management for RAG systems, ensuring AI model context limits are respected while maximizing information density. Key features:

  • Token counting and capacity tracking

  • Priority-based content eviction (relevance, recency, hybrid strategies)

  • Automatic optimization at configurable thresholds

  • Metrics integration for monitoring and tuning

The manager is designed for high-throughput scenarios where context must be continuously optimized as new content arrives and old content ages.

Example

>>> from babylon.rag.context_window import ContextWindowManager
>>> manager = ContextWindowManager()
>>> manager.add_content("key1", "Some text content", importance=0.8)
>>> manager.current_usage
4  # tokens

Classes

ContextWindowManager([config, ...])

Manages the token usage and content prioritization in the RAG context window.

class babylon.rag.context_window.manager.ContextWindowManager(config=None, metrics_collector=None, lifecycle_manager=None)[source]

Bases: object

Manages the token usage and content prioritization in the RAG context window.

The ContextWindowManager ensures that the total token usage stays within the limits of the AI model while prioritizing the most relevant content. It implements:

  1. Token counting and tracking

  2. Content prioritization based on relevance

  3. Automatic optimization when approaching limits

  4. Integration with metrics collection

Parameters:
config

Configuration for the context window

metrics_collector

Collector for performance metrics

lifecycle_manager

Optional manager for object lifecycles

__init__(config=None, metrics_collector=None, lifecycle_manager=None)[source]

Initialize the context window manager.

Parameters:
  • config (ContextWindowConfig | None) – Configuration for token limits and optimization thresholds

  • metrics_collector (MetricsCollector | None) – Collector for performance metrics

  • lifecycle_manager (Any) – Optional manager for object lifecycles

property total_tokens: int

Get the total number of tokens in the context window.

property capacity_percentage: float

Get the current percentage of capacity used.

property content_count: int

Get the number of content items in the context window.

add_content(content_id, content, token_count, importance=0.5)[source]

Add content to the context window.

Parameters:
  • content_id (str) – Unique identifier for the content

  • content (Any) – The content to add

  • token_count (int) – Number of tokens in the content

  • importance (float) – Importance score for the content (0.0 to 1.0)

Return type:

bool

Returns:

True if content was added successfully, False if optimization was required

Raises:
  • ContentInsertionError – If content could not be added

  • CapacityExceededError – If capacity is exceeded and optimization fails

get_content(content_id)[source]

Get content from the context window and update its priority.

Parameters:

content_id (str) – Unique identifier for the content

Return type:

Any

Returns:

The requested content

Raises:

KeyError – If content_id is not found

remove_content(content_id)[source]

Remove content from the context window.

Parameters:

content_id (str) – Unique identifier for the content

Return type:

bool

Returns:

True if content was removed, False if not found

Raises:

ContentRemovalError – If an error occurs during removal

optimize(target_tokens=None)[source]

Optimize the context window to reduce token usage.

Parameters:

target_tokens (int | None) – Target number of tokens to reduce to, defaults to threshold

Return type:

bool

Returns:

True if optimization was successful, False otherwise

Raises:

OptimizationFailedError – If optimization fails

get_stats()[source]

Get statistics about the context window.

Return type:

dict[str, Any]

count_tokens(content)[source]

Count the number of tokens in a string.

This is a simple implementation that could be enhanced with a proper tokenizer.

Parameters:

content (str) – String content to count tokens for

Return type:

int

Returns:

Estimated token count