Object Tracking & Performance
Theoretical limits, practical working sets, and optimization strategies for managing game objects within LLM context window constraints.
Context Window Capacity
Theoretical Limits
With a 200k token context window:
Object Type |
Token Estimate |
Max Objects |
|---|---|---|
Simple Entity |
~100 tokens |
400-600 |
Complex Contradiction |
~300-500 tokens |
200-300 |
Relationship Network |
~200-400 tokens/network |
Variable |
Event Chain |
~200-300 tokens |
Variable |
Token Usage Breakdown
Component |
Token Range |
|---|---|
Object metadata |
10-20 tokens |
Core attributes |
30-50 tokens |
Relationships |
20-40 tokens per connection |
Historical data |
50-100 tokens |
State information |
30-50 tokens |
Practical Working Sets
Immediate Context (Active Memory)
Size: 20-30 objects
Update frequency: Every game tick
Access latency: <10ms
Memory footprint: ~5k tokens
Active Cache
Size: 100-200 objects
Update frequency: As needed
Access latency: <100ms
Memory footprint: ~30k tokens
Background Context
Size: 300-500 objects
Update frequency: Periodic
Access latency: <500ms
Memory footprint: ~60k tokens
Implementation
ContextWindowManager
Implements token counting and tracking
Manages content prioritization based on importance scores
Automatically optimizes context when approaching capacity threshold (default 75%)
Integrates with MetricsCollector for performance tracking
Provides configurable token limits (default 150k tokens)
Configuration Options
class ContextWindowConfig:
max_token_limit: int = 150000
capacity_threshold: float = 0.75
prioritization_strategy: str = "hybrid"
min_content_importance: float = 0.2
Content Management
Content is stored with metadata including token count and importance score
Priority queue maintains content ordered by importance
Automatic optimization removes least important content when threshold is reached
Token counting supports various content types (strings, lists, dictionaries, objects)
Error Handling
Dedicated error codes in 2100-2199 range
Handles capacity exceeded scenarios
Manages content insertion and removal errors
Provides detailed error messages with error codes
Performance Monitoring
Key Metrics
class ObjectMetrics:
def __init__(self):
self.access_count = 0
self.cache_hits = 0
self.cache_misses = 0
self.token_usage = 0
self.load_time = 0.0
self.last_access = None
self.relationship_count = 0
Monitoring Points
- Object Access
Access frequency and patterns
Token usage per object
Cache performance (hits/misses)
- Context Window
Current utilization percentage
Token distribution across content types
Garbage collection triggers
Context switches
- Vector Database
Query latency
Embedding generation time
Storage utilization
Index performance
Optimization Strategies
Client-Side Processing
- Local Computations
Relationship graph updates
Simple state changes
UI updates
Basic validation
- Caching Strategy
Local object cache
Relationship cache
Embedding cache
State history
- Batch Operations
Grouped updates
Bulk loading
Periodic synchronization
Deferred processing
Vector Database Integration
- Query Optimization
Relevance thresholds
Query batching
Index optimization
Caching layers
- Storage Strategy
Compression techniques
Incremental updates
Partial loading
Lazy evaluation
Object Lifecycle Management
class ObjectManager:
def __init__(self):
self.active_objects = LRUCache(max_size=30)
self.cached_objects = LRUCache(max_size=200)
self.metrics = MetricsCollector()
def get_object(self, object_id):
self.metrics.record_access(object_id)
if object_id in self.active_objects:
self.metrics.record_cache_hit('active')
return self.active_objects[object_id]
if object_id in self.cached_objects:
self.metrics.record_cache_hit('secondary')
return self._promote_to_active(object_id)
self.metrics.record_cache_miss()
return self._load_from_vector_db(object_id)
Performance Logging
class MetricsCollector:
def __init__(self):
self.logs = {
'access_patterns': Counter(),
'token_usage': deque(maxlen=1000),
'cache_performance': {'hits': 0, 'misses': 0},
'latency_metrics': {
'db_queries': [],
'context_switches': []
}
}
def analyze_performance(self):
return {
'cache_hit_rate': self._calculate_hit_rate(),
'avg_token_usage': self._calculate_avg_tokens(),
'hot_objects': self._identify_hot_objects(),
'optimization_suggestions': self._generate_suggestions()
}
RAG + Vector Database Architecture
With RAG and vector database integration:
Game Objects in Vector DB
|
v
Query for Relevant Objects
|
v
Load only needed objects into context
|
v
Keep frequently accessed objects in context
|
v
Periodically flush less relevant objects back to vector DB
This architecture allows:
Theoretically unlimited total objects in the game
10,000s of objects in vector DB
Only relevant subset loaded into context
Example distribution:
50k total objects in vector DB
~1000 objects’ embeddings queried per turn
Top 100-200 most relevant loaded into context
20-30 frequently accessed objects kept in “working memory”
Optimization Recommendations
Short-term
Implement basic metrics collection
Set up client-side caching
Monitor token usage
Track access patterns
Medium-term
Optimize query patterns
Implement smart prefetching
Enhance client-side processing
Refine caching strategies
Long-term
Develop advanced compression
Implement predictive loading
Create adaptive optimization
Build performance analytics
Practical Limitations
Query latency to vector DB
Cost of embedding generation
Need for coherent context management
Risk of context fragmentation
Processing overhead for relevance sorting
The key is not trying to load everything at once, but maintaining a dynamic “working set” of objects relevant to the current game state and player actions.
See Also
Context Window Management - Context window management details
AI Integration - AI communications guide
Context Window API Reference - Complete API reference
Error Codes - Error code reference
Configuration System - Configuration system