Part VI: The Discipline Stack
Why Vibe Coding Without Discipline Fails
The critics have a point: naive AI-assisted development produces garbage. Paste whatever ChatGPT outputs, don’t verify, don’t test, don’t review—and you’ll ship bugs, security holes, and unmaintainable messes.
The answer isn’t to reject AI assistance. It’s to pair it with discipline.
The Babylon project demonstrates what disciplined vibe coding looks like. Here’s the stack:
1. Pre-Commit Hooks as Guardrails
Every commit passes through automated checks:
# From .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: ruff-check
name: ruff-check
entry: poetry run ruff check --fix
language: system
types: [python]
- id: ruff-format
name: ruff-format
entry: poetry run ruff format
language: system
types: [python]
- id: mypy
name: mypy
entry: poetry run mypy
language: system
types: [python]
- id: pytest
name: pytest
entry: poetry run pytest -x -q
language: system
types: [python]
- id: commitizen
name: commitizen
entry: cz check --commit-msg-file
language: python
You cannot commit code that:
Fails linting (ruff)
Has type errors (mypy)
Fails tests (pytest)
Has a malformed commit message (commitizen)
The AI can generate whatever it wants. The hooks ensure only valid code reaches the repository. This is automatic, unavoidable, and applies equally to human and AI-generated code.
2. TDD as the Verification Loop
Test-Driven Development isn’t just a testing strategy. It’s a verification loop that ensures AI output matches intent.
The cycle:
Red: Write a test that describes what you want
Green: Ask AI to implement code that passes the test
Refactor: Clean up while keeping tests green
The test is the specification. The AI generates an implementation. If the tests pass, the implementation matches the specification. If they fail, iterate.
This is profound: TDD transforms AI assistance from “trust the output” to “verify the output.” You don’t need to read every line of generated code (though you can). You need to write tests that would fail if the code were wrong.
# From tests/unit/engine/systems/test_solidarity_system.py
def test_consciousness_transmission_when_source_active():
"""Consciousness flows through SOLIDARITY edges when source is active."""
# Arrange: revolutionary periphery, passive core, solidarity connection
graph = create_graph_with_solidarity_edge(
source_consciousness=0.9, # Revolutionary
target_consciousness=0.1, # Passive
solidarity_strength=0.8, # Strong connection
)
# Act: run solidarity system
SolidaritySystem().step(graph, services, context)
# Assert: target consciousness increased
target = get_node(graph, "C_w")
assert target.consciousness > 0.1 # Was raised
assert target.consciousness < 0.9 # Didn't fully equalize
This test specifies behavior. If AI-generated code passes it, the behavior is correct. The test doesn’t care who wrote the implementation.
3. Type Systems as Contracts
Strict typing isn’t just about catching bugs. It’s about defining contracts that both humans and AIs must respect.
# From src/babylon/models/types.py
# Constrained types that prevent invalid states
Probability = Annotated[float, Field(ge=0.0, le=1.0)]
Currency = Annotated[float, Field(ge=0.0)]
Ideology = Annotated[float, Field(ge=-1.0, le=1.0)]
Intensity = Annotated[float, Field(ge=0.0, le=1.0)]
Coefficient = Annotated[float, Field(ge=0.0, le=1.0)]
An AI can’t generate code that assigns -5 to a Probability. The type system rejects it. The AI can’t return None from a function typed to return Currency. MyPy catches it.
Types are contracts that constrain what the AI can produce. They’re not trust—they’re verification at the language level.
4. ADRs as Decision Memory
Architecture Decision Records document why decisions were made:
# From ai-docs/decisions.yaml
ADR016_fascist_bifurcation:
status: "accepted"
date: "2025-12-09"
title: "The Fascist Bifurcation: Solidarity as Infrastructure"
context: |
Sprint 3.4.2 implemented Proletarian Internationalism, but a critical
design question emerged: should solidarity_strength be auto-calculated
from source organization, or stored as a persistent edge attribute?
decision: |
Store solidarity_strength as a PERSISTENT EDGE ATTRIBUTE, not auto-calculated.
rationale:
- "Encodes anti-accelerationist theory into mechanics"
- "Enables emergent Fascist vs Revolutionary outcomes"
- "Player agency: must choose to BUILD solidarity before crisis"
mantra: "Agitation without solidarity produces fascism, not revolution."
When future sessions encounter this code, they can read why it was designed this way. The AI doesn’t need to rediscover the reasoning—it’s documented. Decisions compound; knowledge accumulates.
5. Session Memory as Context Continuity
The claude-mem system provides continuity between sessions. Recent observations, decisions, and learnings are available at session start:
**Legend:** 🔴 bugfix | 🟣 feature | 🔄 refactor | ✅ change | 🔵 discovery | ⚖️ decision
| ID | Time | T | Title |
|----|------|---|-------|
| #6270 | 11:58 AM | 🟣 | Created Comprehensive Architecture Documentation |
| #6271 | 11:58 AM | 🟣 | Created George Jackson Bifurcation Model Documentation |
| #6279 | 12:03 PM | ✅ | Committed Comprehensive RST Documentation to Git |
This is organizational memory externalized. The AI’s context window is ephemeral; the memory system is persistent. Knowledge survives session boundaries.
Why Discipline Enables Freedom
Paradoxically, these constraints enable freedom. When you know the pre-commit hooks will catch errors, you can experiment more freely. When you know TDD will verify correctness, you can accept AI suggestions more confidently. When you know types will enforce contracts, you can refactor more aggressively.
Discipline isn’t the opposite of vibe coding. It’s what makes vibe coding sustainable.
Without discipline, vibe coding is a sugar rush: productive in the moment, disastrous over time. With discipline, vibe coding is a power tool: amplified capability with maintained quality.