Object Tracking & Performance

Theoretical limits, practical working sets, and optimization strategies for managing game objects within LLM context window constraints.

Context Window Capacity

Theoretical Limits

With a 200k token context window:

Object Type	Token Estimate	Max Objects
Simple Entity	~100 tokens	400-600
Complex Contradiction	~300-500 tokens	200-300
Relationship Network	~200-400 tokens/network	Variable
Event Chain	~200-300 tokens	Variable

Token Usage Breakdown

Component	Token Range
Object metadata	10-20 tokens
Core attributes	30-50 tokens
Relationships	20-40 tokens per connection
Historical data	50-100 tokens
State information	30-50 tokens

Practical Working Sets

Immediate Context (Active Memory)

Size: 20-30 objects
Update frequency: Every game tick
Access latency: <10ms
Memory footprint: ~5k tokens

Active Cache

Size: 100-200 objects
Update frequency: As needed
Access latency: <100ms
Memory footprint: ~30k tokens

Background Context

Size: 300-500 objects
Update frequency: Periodic
Access latency: <500ms
Memory footprint: ~60k tokens

Implementation

ContextWindowManager

Implements token counting and tracking
Manages content prioritization based on importance scores
Automatically optimizes context when approaching capacity threshold (default 75%)
Integrates with MetricsCollector for performance tracking
Provides configurable token limits (default 150k tokens)

Configuration Options

class ContextWindowConfig:
    max_token_limit: int = 150000
    capacity_threshold: float = 0.75
    prioritization_strategy: str = "hybrid"
    min_content_importance: float = 0.2

Content Management

Content is stored with metadata including token count and importance score
Priority queue maintains content ordered by importance
Automatic optimization removes least important content when threshold is reached
Token counting supports various content types (strings, lists, dictionaries, objects)

Error Handling

Dedicated error codes in 2100-2199 range
Handles capacity exceeded scenarios
Manages content insertion and removal errors
Provides detailed error messages with error codes

Performance Monitoring

Key Metrics

class ObjectMetrics:
    def __init__(self):
        self.access_count = 0
        self.cache_hits = 0
        self.cache_misses = 0
        self.token_usage = 0
        self.load_time = 0.0
        self.last_access = None
        self.relationship_count = 0

Monitoring Points

Object Access

Access frequency and patterns
Token usage per object
Cache performance (hits/misses)

Context Window

Current utilization percentage
Token distribution across content types
Garbage collection triggers
Context switches

Vector Database

Query latency
Embedding generation time
Storage utilization
Index performance

Optimization Strategies

Client-Side Processing

Local Computations

Relationship graph updates
Simple state changes
UI updates
Basic validation

Caching Strategy

Local object cache
Relationship cache
Embedding cache
State history

Batch Operations

Grouped updates
Bulk loading
Periodic synchronization
Deferred processing

Vector Database Integration

Query Optimization

Relevance thresholds
Query batching
Index optimization
Caching layers

Storage Strategy

Compression techniques
Incremental updates
Partial loading
Lazy evaluation

Object Lifecycle Management

class ObjectManager:
    def __init__(self):
        self.active_objects = LRUCache(max_size=30)
        self.cached_objects = LRUCache(max_size=200)
        self.metrics = MetricsCollector()

    def get_object(self, object_id):
        self.metrics.record_access(object_id)

        if object_id in self.active_objects:
            self.metrics.record_cache_hit('active')
            return self.active_objects[object_id]

        if object_id in self.cached_objects:
            self.metrics.record_cache_hit('secondary')
            return self._promote_to_active(object_id)

        self.metrics.record_cache_miss()
        return self._load_from_vector_db(object_id)

Performance Logging

class MetricsCollector:
    def __init__(self):
        self.logs = {
            'access_patterns': Counter(),
            'token_usage': deque(maxlen=1000),
            'cache_performance': {'hits': 0, 'misses': 0},
            'latency_metrics': {
                'db_queries': [],
                'context_switches': []
            }
        }

    def analyze_performance(self):
        return {
            'cache_hit_rate': self._calculate_hit_rate(),
            'avg_token_usage': self._calculate_avg_tokens(),
            'hot_objects': self._identify_hot_objects(),
            'optimization_suggestions': self._generate_suggestions()
        }

RAG + Vector Database Architecture

With RAG and vector database integration:

        flowchart TB
    A["Game Objects in Vector DB<br/>(50k total)"] --> B["Query for Relevant Objects<br/>(~1000 embeddings)"]
    B --> C["Load only needed objects into context<br/>(100-200 most relevant)"]
    C --> D["Keep frequently accessed objects<br/>(20-30 working memory)"]
    D --> E["Periodically flush less relevant<br/>back to Vector DB"]
    E -.-> A

%% Luxe Gothic styling
classDef storage fill:#8B0A1A,stroke:#DC143C,color:#F7F5F3
classDef process fill:#DC143C,stroke:#696969,color:#F7F5F3

class A,E storage
class B,C,D process

This architecture allows:

Theoretically unlimited total objects in the game
10,000s of objects in vector DB
Only relevant subset loaded into context
Example distribution:
- 50k total objects in vector DB
- ~1000 objects’ embeddings queried per turn
- Top 100-200 most relevant loaded into context
- 20-30 frequently accessed objects kept in “working memory”

Optimization Recommendations

Short-term

Implement basic metrics collection
Set up client-side caching
Monitor token usage
Track access patterns

Medium-term

Optimize query patterns
Implement smart prefetching
Enhance client-side processing
Refine caching strategies

Long-term

Develop advanced compression
Implement predictive loading
Create adaptive optimization
Build performance analytics

Practical Limitations

Query latency to vector DB
Cost of embedding generation
Need for coherent context management
Risk of context fragmentation
Processing overhead for relevance sorting

The key is not trying to load everything at once, but maintaining a dynamic “working set” of objects relevant to the current game state and player actions.