Tensor Hierarchy Architecture

Babylon’s economic computation uses a three-level tensor hierarchy built on top of the Marxian value tensor. Each level provides progressively higher abstractions: raw county-year primitives at Level 0, federal data sources at Level 1, and derived computations at Level 2.

This document explains why the hierarchy exists and why each design choice was made. For the complete data dictionary, see Tensor Hierarchy Reference.

The Three Levels 

Level 0: The Primitive 

The ValueTensor4x3 (Feature 011) is the simulation’s foundational type. It represents the 4×3 Marxian reproduction schema for a single county-year: four departments each decomposed into constant capital (c), variable capital (v), and surplus value (s). Level 0 is the output—the thing the simulation engine consumes.

Level 1: Federal Data Sources 

Level 1 tensors are extracted directly from federal datasets. They represent empirical economic structure at the national scale, loaded once and shared across all county-year computations:

InterIndustryFlow: The BEA Use table direct requirements matrix A, where A[i,*j*] is the dollar value of industry i’s output required per dollar of industry j’s output. Covers ~70 BEA Summary-level industries. Source: Bureau of Economic Analysis I-O accounts (annual, 1997–2024).
VisibilityMetric: The diagonal visibility tensor G = diag(g₁₁, g_22a, g_22b, g₃₃), measuring what fraction of each Marxian department’s labor is commodified (visible to the price system). Source: ATUS time-use data via the Feature 015 gamma module.
GeographicFlow: The BTS FAF5 origin-destination commodity flow matrix F, where F[a,*b*] is the USD value (millions) of freight shipped from CFS Area a to CFS Area b (~130 areas). Source: Bureau of Transportation Statistics Freight Analysis Framework (FAF5), 2022 scale: $18.7T, 2.49M records.
ReproductionRequirements: Consumption bundles and reproductive labor time by social class across the four Marxian departments. Source: CEX consumer expenditure survey + ATUS time-use data. Production loader deferred—see US4 below.
ClassTransitionMatrix: The class mobility stochastic matrix P, where P[i,*j*] is the probability of someone in class i transitioning to class j over a specified period. Source: PSID Panel Study of Income Dynamics. Production loader deferred—see US5 below.

Level 2: Derived Computations 

Level 2 tensors are derived from Level 1 inputs by mathematical operations. They have no independent data sources—they are pure functions of Level 1 data:

LeontiefInverse: L = (I − A)⁻¹, the total requirements matrix. Captures both direct and all indirect supply-chain dependencies embodied in final demand. See Input-Output Economics and the Leontief Inverse for the mathematical theory.
ImperialRentField: φ[a] = inflow[a] − outflow[a], the net value extraction per CFS area. Positive values identify core accumulation zones; negative values identify periphery extraction zones. For a closed system, Σφ ≈ 0. See The Imperial Rent Field: Spatial Value Extraction for the mathematical theory.
ShadowSubsidyTensor: Department III value × (1 − g₃₃): the unpaid reproductive labor appropriated as surplus by capital. Quantifies the “invisible” care-work contribution that standard national accounts systematically exclude.
StationaryDistribution: The long-run class distribution π satisfying π*P* = π, normalized to sum 1. Computed as the dominant left eigenvector of P. Represents the gravitational pull of class structure toward a long-run equilibrium. See Class Mobility: Markov Chains and Stationary Distributions for the mathematical theory.

Why Three Levels?

The hierarchy serves three functions:

Separation of concerns. Federal data sources (Level 1) are isolated from derived computations (Level 2). Adding a new computation requires only a new computer protocol, not changes to data loading or schema.

Incremental buildout. Each Level 1 tensor can be present or absent independently. US3 (geographic flows, requiring FAF data download) can be absent without blocking US1 (inter-industry flows, from XLSX already present). The NoDataSentinel makes absence explicit.

Testability. Level 2 computations are unit-tested with synthetic Level 1 tensors, without requiring real federal data to be present. The stub pattern (see deferred loaders below) ensures the composition structure is correct even before production data is available.

Marxian Departments: Theory and Contested Boundaries 

The four-department scheme comes from a combination of sources:

Marx, Capital Volume II (1885): two departments (means of production / means of consumption)
Shaikh & Tonak, Measuring the Wealth of Nations (1994): split Department II into necessary and luxury consumption
Fortunati, The Arcane of Reproduction (1981): Department III for social reproduction, the unwaged labor that produces labor power itself

The Four Departments 

Marxian Department Classification
Dept	Name	Theoretical Role	BEA Summary Examples
I	Means of Production	Capital goods consumed productively by other industries	Mining, Machinery, Construction, Chemicals, Finance, Transport
IIa	Necessary Consumption	Wage goods that reproduce labor power	Food & beverage, Textiles, Retail trade, Basic lodging
IIb	Luxury Consumption	Surplus value sink; bourgeois and labor-aristocracy consumption	Consumer electronics, Furniture, Gambling, Fine dining, Luxury retail
III	Social Reproduction	Labor power produced outside or at the margin of the wage relation	Health care, Education, Social assistance, Private households

The core theoretical point is that Department III is not optional or peripheral— it is necessary for capital accumulation. Every worker who shows up for work was reproduced by Department III labor. The visibility scalar g₃₃ measures how much of this labor is commodified (visible) versus performed as unwaged domestic work (invisible to the price system).

Contested Industry Boundaries 

Several industries are genuinely ambiguous. The mapping file captures the judgment calls with explicit rationale:

Motor vehicles (BEA 3360A0) → Department I. Commercial trucks, buses, and vehicle components dominate by output value. Consumer automobiles are a boundary case with Department IIb. A commodity-by-industry bridge could split this, but dominant-use classification puts the sector in I.

Retail trade (BEA 4400) → Department IIa. Food and basic household goods dominate by transaction volume. Luxury retail overlaps with IIb; the mapping addresses this with separate IIb entries for clothing stores (4481) and department/general merchandise stores (4521, 4529, 4530).

Owner-occupied housing (FIRE0) → Department I. The BEA treats imputed rental income as capital output, consistent with Shaikh & Tonak’s treatment of housing as a capital asset rather than consumer expenditure.

Financial intermediation (521CI, 523, 524) → Department I. Finance enables productive capital circulation; it is a producer service, not a consumer good.

The mapping lives in src/babylon/economics/tensor_hierarchy/mappings/bea_to_department.toml. The full industry-by-industry table with rationale notes is in BEA Industry to Marxian Department Mapping.

Protocol-Based Dependency Injection 

Every tensor source and computation in the hierarchy is a Python Protocol. No concrete class is hard-coded into computation pipelines. This enables testing without real data and swapping implementations without changing callers.

The Source Pattern 

# Protocol (in protocols.py)
class InterIndustryFlowSource(Protocol):
    def get_direct_requirements(self, year: int) -> InterIndustryFlow | NoDataSentinel: ...

# Default implementation (backed by SQLite)
source = DefaultInterIndustryFlowSource(session_factory)

# Test stub (backed by synthetic data)
class StubSource:
    def get_direct_requirements(self, year: int) -> InterIndustryFlow:
        return synthetic_flow_tensor

The Computation Pattern 

# Protocol (in protocols.py)
class LeontiefComputer(Protocol):
    def compute_inverse(self, flow: InterIndustryFlow) -> LeontiefInverse: ...

# Use the default
computer = DefaultLeontiefComputer()
inverse = computer.compute_inverse(flow)

All five source protocols follow the same pattern. All three computation protocols follow the same pattern. This means the full pipeline can be instantiated with dependency-injected components, each independently substitutable.

The NoDataSentinel Pattern 

When a data source cannot provide data (missing file, deferred loader, or absent year), it returns a NoDataSentinel rather than raising an exception. The sentinel is falsy and carries a human-readable reason:

result = source.get_direct_requirements(year=1990)
if not result:
    # Returns reason: "No I-O data for 1990 (BEA tables start 1997)"
    logger.warning("Missing data: %s", result.reason)
else:
    inverse = computer.compute_inverse(result)

This makes data absence explicit at each call site, prevents silent propagation of missing-data errors through the pipeline, and avoids exception-based control flow for expected conditions.

Deferred Loaders: US4 and US5 

Two Level 1 data sources have production loaders explicitly deferred pending data governance arrangements:

US4 — ReproductionRequirements 

Production data would come from the Consumer Expenditure Survey (CEX) linked to ATUS time-use records by class category. This requires BLS microdata agreements. The DefaultReproductionSource stub always returns a NoDataSentinel with reason "CEX data source pending (US4 deferred)".

The DefaultReproductionRequirementsComputer is fully implemented and tested with synthetic data, verifying that the computation structure is correct.

US5 — ClassTransitionMatrix 

Production data would come from the Panel Study of Income Dynamics (PSID), which requires a restricted-use data agreement with the University of Michigan. The DefaultClassTransitionSource stub returns NoDataSentinel for all queries, with reason "PSID data source pending constitutional amendment (US5 deferred loader)".

The DefaultClassTransitionComputer (eigendecomposition, class aggregation) is fully implemented and passes all tests with synthetic matrices.

The deferred pattern means that:

The computation code is complete and correct.
Tests verify the math without requiring real data.
When production data becomes available, only the source protocol implementation needs to change—no downstream code changes required.