Part II: Empirical Evidence from Babylon

The Numbers

Babylon is a geopolitical simulation engine modeling the collapse of American hegemony through Marxist-Leninist-Maoist Third Worldist theory. It’s a complex technical project with mathematical foundations, graph-based architecture, and AI narrative integration. Here’s what the git history reveals:

Commit Statistics

Metric

Value

Total commits

531

Time span

November 30, 2024 to December 11, 2025

AI-assisted commits

151 (28.4%)

Human commits

380 (71.6%)

Codebase Size

Metric

Value

Production code

16,154 lines

Test code

28,231 lines

Test:code ratio

1.7:1

Test functions

1,444 across 73 files

Architecture Documentation

Metric

Value

Architecture Decision Records

20+

YAML specification files

25+

Design documents

28 markdown files

Development Tools Used

  • Claude Code (primary)

  • Aider (secondary)

  • Devin AI (experimental)

  • GitHub Copilot (legacy)

What the Commits Reveal

The git history tells a story of structured chaos. Development happens in intense bursts—140 commits in 4 days (December 7-11, 2025)—followed by periods of dormancy. This is not the steady drumbeat of traditional software development. It’s the rhythm of creative flow: inspiration, execution, rest.

The commit messages follow conventional commit format (feat:, fix:, docs:, refactor:), enforced by pre-commit hooks. Even in the intensity of a 58-commit day, every commit is categorized, every change is traceable. The discipline doesn’t disappear under pressure—it’s what enables the pressure.

Here’s a sample of recent commits:

feat(engine): add Carceral Geography to TerritorySystem (Sprint 3.7)
feat(observer): add TopologyMonitor for condensation detection (Sprint 3.1)
refactor(models): replace IdeologicalComponent with George Jackson Model
docs(ai-docs): add observer-layer.yaml with Bondi Algorithm aesthetic
fix(engine): calculate wages from tribute flow, not accumulated wealth

Notice the sprint numbers, the specific component references, the mix of features, fixes, and documentation. This is not chaos. This is vibe coding with discipline.

The AI-Assisted vs Human Commit Breakdown

AI-assisted commits cluster around specific activity types:

High AI assistance (>50% of commits in category)

  • Documentation generation

  • Test boilerplate

  • Infrastructure/tooling

  • Type annotations

  • Formatting/linting fixes

Low AI assistance (<20% of commits in category)

  • Core algorithm design

  • Architecture decisions

  • Bug fixes in game logic

  • Mathematical formula implementation

The pattern is clear: AI handles the scaffolding, humans handle the soul. The division of labor isn’t random—it’s rational. AI excels at mechanical tasks with clear patterns. Humans excel at judgment calls with unclear tradeoffs.

Code Quality Metrics

The codebase enforces quality through tooling:

# From pyproject.toml
[tool.mypy]
strict = true
disallow_untyped_defs = true
warn_return_any = true

[tool.ruff.lint]
select = ["E", "W", "F", "I", "B", "C4", "UP", "ARG", "SIM"]

MyPy strict mode means every function has type annotations, every variable has a declared type. Ruff catches style violations, potential bugs, unnecessary complexity. These aren’t aspirational—they’re enforced. Every commit passes through pre-commit hooks that verify compliance.

The result: you can read any function in the codebase and know exactly what types it accepts and returns. You can refactor with confidence because the type checker will catch mistakes. You can onboard new contributors (human or AI) because the code is self-documenting.

This is what vibe coding produces when paired with discipline.