BEA Input-Output Tables

Reference for the Bureau of Economic Analysis (BEA) Input-Output data files used in Feature 025, including file inventory, XLSX structure, the coefficient computation formula, and the BEAIOLoader API.

For the theoretical background, see Input-Output Economics and the Leontief Inverse. For the schema tables these populate, see Tensor Hierarchy Database Schema.


Data Overview

The BEA publishes annual Input-Output accounts at three levels of industry detail: Detail (~400 industries), Summary (~70 industries), and Sector (~15). Babylon uses the Summary level because it aligns with the four-Marxian- department aggregation (see BEA Industry to Marxian Department Mapping).

Files are downloaded from the BEA I-O Accounts data page and stored in data/input-output/ relative to the project root. The files are large multi-sheet XLSX workbooks, one sheet per year.


File Inventory

BEA I-O Data Files in data/input-output/

File path

Contents

make-use/IOUse_Before_Redefinitions_PRO_Summary.xlsx

Primary source for BEAIOLoader. Use table at Producers’ prices, Summary level. Multi-sheet XLSX, one sheet per year (1997–2024). Contains commodity-by-industry intermediate use data.

make-use/IOUse_Before_Redefinitions_PUR_Summary.xlsx

Use table at Purchasers’ prices, Summary level. Not used by default.

make-use/IOMake_Before_Redefinitions_PRO_Summary.xlsx

Make table, Summary level (industry-by-commodity output structure).

make-use/IOUse_Before_Redefinitions_PRO_Sector.xlsx

Use table at Producers’ prices, Sector level (~15 industries).

make-use/IOMake_Before_Redefinitions_PRO_Sector.xlsx

Make table, Sector level.

supply-use/Supply_Summary.xlsx

Supply table, Summary level.

supply-use/Use_Summary.xlsx

Use table (supply-use framework), Summary level.

total-domestic-requirements/IxI_TR_Summary.xlsx

Industry-by-industry total requirements (Leontief inverse as published by BEA). Useful for validation against DefaultLeontiefComputer.

total-domestic-requirements/CxC_TR_Summary.xlsx

Commodity-by-commodity total requirements, Summary level.

total-domestic-requirements/IxI_Domestic_Summary.xlsx

Domestic-only total requirements (excludes imports), Summary level.

Loader default: BEAIOLoader reads only make-use/IOUse_Before_Redefinitions_PRO_Summary.xlsx.


XLSX Row Structure

Each sheet in the Use table XLSX (one sheet per year) follows this fixed row layout (0-indexed):

BEA I-O XLSX Row Structure

Row

Content

Example

0

Title string

"Use of Commodities by Industries Before Redefinitions..."

1

Units

"(Millions of dollars)"

2

Source

"Bureau of Economic Analysis"

3

Year

"2021"

4

Empty

(blank row)

5

Column codes

None, "Commodities/Industries", "1100A1", "1130A1", ..., "T001", ...

6

Column names

"IOCode", "Description", "Farms", "Forestry...", ...

7+

Data rows

"1100A1", "Farms", 12.3, 0.0, ..., 156.7, ...

Column extraction: Industry columns start at column index 2 of row 5. Final-demand columns (prefixed F) are excluded. The parser identifies industry columns as those whose row-5 code does not start with F.

Row codes: Each data row’s first cell (column 0) is the IOCode. Special rows are excluded from the industry list:

  • T001T020: Total/subtotal rows (intermediate totals, final demand totals, gross output total T019)

  • V001V006: Value-added rows (compensation, gross operating surplus, etc.)

Only rows whose IOCode is not in the _SKIP_CODES set are treated as industry rows.

Missing Data

BEA uses the string '...' to indicate suppressed or unavailable data. The loader converts '...' to 0.0. This understates inter-industry linkages in sectors with incomplete survey coverage but is the conservative default.


Coefficient Computation

The Use table contains raw dollar values: Use[i,j] = millions USD of commodity i used by industry j as intermediate input.

The direct requirements coefficient is:

\[A[i, j] = \frac{\text{Use}[i, j]}{\text{GrossOutput}[j]}\]

Gross output source: Row T019 ("T019" IOCode) contains gross output by industry, if present. The loader reads this row into a gross_output dict mapping industry code → gross output value (minimum 1.0 to avoid division by zero).

Fallback when T019 is absent: If T019 is not found, gross output is approximated as column sum: total intermediate use plus value-added rows:

\[\text{GrossOutput}[j] \approx \sum_i \text{Use}[i, j] + \text{ValueAdded}[j]\]

This fallback is less accurate but produces valid (if slightly off) coefficients.


BEAIOLoader API

class BEAIOLoader

Loader for BEA I-O coefficients into the 3NF SQLite schema.

Reads IOUse_Before_Redefinitions_PRO_Summary.xlsx from data/input-output/make-use/. Each sheet (named by year, e.g., '1997', '2021') is parsed and loaded independently.

Prerequisite: dim_bea_industry must be pre-populated by BEANationalLoader before BEAIOLoader can run. The industry lookup maps BEA codes to bea_industry_id values.

__init__(config=None, data_dir=None)
Parameters:
  • configLoaderConfig for batch size and operational settings. Default: LoaderConfig() with batch_size=1000.

  • data_dir – Base data directory. Default: Path("data") relative to project working directory.

load(session, reset=True, verbose=True, **kwargs) LoadStats

Load BEA I-O coefficients into 3NF schema.

Parameters:
  • session – SQLAlchemy Session connected to the normalized DB.

  • reset – If True, delete existing records in fact_bea_io_coefficient and dim_bea_io_table_type before loading. Default: True.

  • verbose – If True, log progress per sheet. Default: True.

  • kwargs – Ignored (present for interface compatibility).

Returns:

LoadStats with fields:

  • dimensions_loaded["dim_bea_io_table_type"]: 1 (USE record)

  • facts_loaded["fact_bea_io_coefficient"]: total coefficient rows

  • files_processed: number of year sheets successfully parsed

  • errors: list of error strings (empty on success)

Raises:

Does not raise; errors are appended to LoadStats.errors.

Tables populated:

  • dim_bea_io_table_type: Creates/retrieves the USE record.

  • fact_bea_io_coefficient: Inserts non-zero coefficient rows for each year via upsert (conflict key: uq_bea_io_coeff).

Year range: Sheets named with valid years (1990–2030) are processed. Sheet names that do not parse as integers in this range are skipped.

Example:

from babylon.data.bea.io_loader import BEAIOLoader
from babylon.data.reference.database import get_session_factory

loader = BEAIOLoader()
session_factory = get_session_factory()
with session_factory() as session:
    stats = loader.load(session)
print(f"Loaded {stats.facts_loaded['fact_bea_io_coefficient']} coefficients")
print(f"Sheets processed: {stats.files_processed}")

IOMatrixParser API

class IOMatrixParser

Internal parser for BEA I-O XLSX row data. Used by BEAIOLoader; can be used independently for custom XLSX processing.

parse_rows(rows) tuple[list[str], np.ndarray]

Parse a list of row tuples (from openpyxl iter_rows(values_only=True)) into industry codes and coefficient matrix.

Parameters:

rows – List of row tuples. Expected format: 7 header rows then data.

Returns:

Tuple of (industries, matrix) where:

  • industries: Ordered list of BEA industry codes (columns).

  • matrix: Direct requirements matrix A, shape (n, n), float64. Zero if insufficient data.

How to Obtain the Data

BEA I-O data files are already included in the repository under data/input-output/. No manual download is required.

If refreshing to a newer BEA release:

  1. Go to https://www.bea.gov/industry/input-output-accounts-data

  2. Download “Use Tables/Before Redefinitions/Producers’ Prices/Summary” XLSX

  3. Replace data/input-output/make-use/IOUse_Before_Redefinitions_PRO_Summary.xlsx

  4. Re-run BEAIOLoader.load()