Persistence Architecture
A simulation engine that cannot recover from crashes, isolate concurrent sessions, or answer analytical queries about past runs is a toy. The persistence layer exists to turn the simulation into a reliable system by recording every tick’s state in a form that survives process death and supports structured queries.
Why Two Backends?
The persistence layer supports two storage backends behind a single
RuntimePersistence protocol:
SQLite (RuntimeDatabase): Zero configuration, in-memory option,
no external dependencies. A developer clones the repository and runs
tests without installing PostgreSQL. CI pipelines use in-memory SQLite
for speed. The cost is limited functionality: SQLite implements only
the 5-method RuntimePersistence protocol — no community state, no
spatial queries, no trace partitioning.
PostgreSQL (PostgresRuntime): Concurrent session support, PostGIS
spatial queries, pgvector semantic search, JSONB analytical queries, and
native table partitioning. PostgreSQL implements RuntimePersistence
(5 methods) plus PostgresRuntimeExtensions (12 additional methods)
for subsystem state added by Features 002, 022, 029, 032, and 036.
The protocol boundary means the simulation engine never knows which
backend is active. PersistenceObserver receives a RuntimePersistence
handle at construction time. It calls the 5 base methods unconditionally,
then uses isinstance(persistence, PostgresRuntimeExtensions) to call
extended methods when the backend supports them.
This is not a leaky abstraction — SQLite genuinely cannot persist community hypergraph state or contradiction fields. The protocol boundary makes this explicit rather than hiding it behind silent no-ops or feature flags.
The Protocol Boundary
Constitution II.6 mandates zero database I/O during tick computation.
All seven simulation systems read and write graph node attributes in
memory. Persistence happens after the tick completes, triggered by
the PersistenceObserver.
The sequence per tick:
SimulationEngine.run_tick()mutates the graph in memory.WorldState.from_graph()validates the result.Observer dispatch calls
PersistenceObserver.on_tick().on_tick()callspersist_tick()on the backend.If the backend implements
PostgresRuntimeExtensions, extended persist methods are called.TraceRecorder.flush()writes buffered trace events.
The protocol uses structural typing (typing.Protocol) rather than
abstract base classes. RuntimeDatabase satisfies RuntimePersistence
without inheriting from it — it simply has methods with matching
signatures. This follows the project’s established pattern of Protocol
plus default implementation, as used throughout the economics modules.
Session-Scoped Isolation
Every row in the PostgreSQL schema is keyed by (session_id, tick,
entity_id). Multiple concurrent simulations share one PostgreSQL
instance without interference. Each session gets its own UUID,
and all queries are scoped by it.
The SQLite backend ignores session_id parameters (they are accepted
for protocol compatibility but unused). SQLite databases are inherently
single-session: one file per simulation run.
Trace logging uses PostgreSQL native list partitioning on session_id.
Each traced session gets its own partition table. This enables instant
cleanup: DROP TABLE trace_log_{session_hex} removes all trace data
for a session with zero dead tuples and no VACUUM required.
Three-Database Topology
The system uses three distinct database roles, each with different access patterns:
DuckDB (data/duckdb/marxist-data-3NF.duckdb): Empirical research
data — Census ACS, FRED economic indicators, BEA input-output tables,
QCEW employment data, HIFLD infrastructure, BTS freight flows. Read-only
during simulation. Feeds county-level parameter initialization.
SQLite or PostgreSQL: Runtime simulation state — graph snapshots, events, tick logs, community state, spatial hex data, contradiction fields. Read-write every tick. The active backend depends on deployment context (SQLite for dev/test, PostgreSQL for production).
Cloudflare R2 (planned): Archived Parquet files exported from
completed sessions. Write-once, read via DuckDB’s native Parquet reader
for cross-game analytics. The archival pipeline
(babylon.persistence.archival) is currently stubbed.
These three roles never overlap. DuckDB does not store simulation state. The runtime database does not store empirical research data. R2 stores only completed, exported sessions.
UPSERT Semantics and Idempotency
Every persist method uses ON CONFLICT DO UPDATE (PostgreSQL) or
INSERT OR REPLACE (SQLite). Persisting the same tick twice produces
the same result as persisting it once.
This matters for crash recovery. If the process dies between persisting node state and edge state for tick n, restarting the simulation can re-persist tick n from the in-memory graph without checking what was already written. The UPSERT overwrites any partial state from the interrupted persist.
The alternative — checking which rows exist before inserting — would require read-before-write logic that contradicts the write-only nature of the persist path.
Trace Logging and Observability
TraceRecorder buffers structured events in a Python list during tick
computation. No database I/O occurs during the tick. After the tick
completes, flush() writes the buffer to the trace_log table in
a single executemany call, then clears the buffer.
TraceLevel controls verbosity:
NONE(0): Tracing disabled.trace()is a no-op.SUMMARY(1): High-level tick summaries.DEBUG(2): Detailed system-level events.TRACE(3): Full per-node event logging.
The trace_log table is UNLOGGED — PostgreSQL skips WAL writes
for it. This provides faster bulk inserts at the cost of durability:
trace data is lost on crash. This is an acceptable trade-off because
simulations are deterministically replayable from their RNG seed. Trace
data is ephemeral debugging output, not source of truth.
Vector Search Migration
The existing VectorStore wraps ChromaDB as a concrete class with no
protocol interface. Feature 037 introduces VectorStoreProtocol as a
formal contract and PgVectorStore as a PostgreSQL-native
implementation using the pgvector extension.
PgVectorStore stores document embeddings in the document_chunk
table with an HNSW index using cosine distance (the <=> operator).
The schema defines vector(768) columns matching the default Ollama
embeddinggemma model dimension.
Both ChromaDB and pgvector implement the same 4-method protocol
(add_chunks, query_similar, delete_chunks,
get_collection_count). The Retriever is backend-agnostic — it
interacts only through VectorStoreProtocol.
The motivation for pgvector over ChromaDB: colocation with simulation data in the same PostgreSQL instance eliminates a separate persistence system, reduces operational complexity, and enables SQL joins between vector search results and simulation state.
See Also
Persistence Layer — Persistence API reference
Architecture: The Embedded Trinity — Embedded Trinity architecture overview