Part II: Empirical Evidence from Babylon

The Numbers

Babylon is a geopolitical simulation engine modeling the collapse of American hegemony through Marxist-Leninist-Maoist Third Worldist theory. It’s a complex technical project with mathematical foundations, graph-based architecture, and AI narrative integration. Here’s what the git history reveals:

Commit Statistics

Metric	Value
Total commits	531
Time span	November 30, 2024 to December 11, 2025
AI-assisted commits	151 (28.4%)
Human commits	380 (71.6%)

Codebase Size

Metric	Value
Production code	16,154 lines
Test code	28,231 lines
Test:code ratio	1.7:1
Test functions	1,444 across 73 files

Architecture Documentation

Metric	Value
Architecture Decision Records	20+
YAML specification files	25+
Design documents	28 markdown files

Development Tools Used

Claude Code (primary)
Aider (secondary)
Devin AI (experimental)
GitHub Copilot (legacy)

What the Commits Reveal

The git history tells a story of structured chaos. Development happens in intense bursts—140 commits in 4 days (December 7-11, 2025)—followed by periods of dormancy. This is not the steady drumbeat of traditional software development. It’s the rhythm of creative flow: inspiration, execution, rest.

The commit messages follow conventional commit format (feat:, fix:, docs:, refactor:), enforced by pre-commit hooks. Even in the intensity of a 58-commit day, every commit is categorized, every change is traceable. The discipline doesn’t disappear under pressure—it’s what enables the pressure.

Here’s a sample of recent commits:

feat(engine): add Carceral Geography to TerritorySystem (Sprint 3.7)
feat(observer): add TopologyMonitor for condensation detection (Sprint 3.1)
refactor(models): replace IdeologicalComponent with George Jackson Model
docs(ai-docs): add observer-layer.yaml with Bondi Algorithm aesthetic
fix(engine): calculate wages from tribute flow, not accumulated wealth

Notice the sprint numbers, the specific component references, the mix of features, fixes, and documentation. This is not chaos. This is vibe coding with discipline.

The AI-Assisted vs Human Commit Breakdown

AI-assisted commits cluster around specific activity types:

High AI assistance (>50% of commits in category)

Documentation generation
Test boilerplate
Infrastructure/tooling
Type annotations
Formatting/linting fixes

Low AI assistance (<20% of commits in category)

Core algorithm design
Architecture decisions
Bug fixes in game logic
Mathematical formula implementation

The pattern is clear: AI handles the scaffolding, humans handle the soul. The division of labor isn’t random—it’s rational. AI excels at mechanical tasks with clear patterns. Humans excel at judgment calls with unclear tradeoffs.

Code Quality Metrics

The codebase enforces quality through tooling:

# From pyproject.toml
[tool.mypy]
strict = true
disallow_untyped_defs = true
warn_return_any = true

[tool.ruff.lint]
select = ["E", "W", "F", "I", "B", "C4", "UP", "ARG", "SIM"]

MyPy strict mode means every function has type annotations, every variable has a declared type. Ruff catches style violations, potential bugs, unnecessary complexity. These aren’t aspirational—they’re enforced. Every commit passes through pre-commit hooks that verify compliance.

The result: you can read any function in the codebase and know exactly what types it accepts and returns. You can refactor with confidence because the type checker will catch mistakes. You can onboard new contributors (human or AI) because the code is self-documenting.

This is what vibe coding produces when paired with discipline.