BTS FAF5 Freight Analysis Data

Reference for the Bureau of Transportation Statistics (BTS) Freight Analysis Framework (FAF5) data files, including CFS area geography, file inventory, CSV column schema, and the FAFLoader API.

For the theoretical background, see The Imperial Rent Field: Spatial Value Extraction. For the schema tables these populate, see Tensor Hierarchy Database Schema.


What CFS Areas Are

The Census Bureau’s Commodity Flow Survey (CFS) aggregates US counties into Commodity Flow Survey (CFS) Areas—approximately 130 geographic zones designed for freight flow analysis. CFS areas represent:

  • Single large counties (e.g., Los Angeles County alone is one CFS area)

  • Multi-county metropolitan regions (e.g., Boston MA-NH CFS Area)

  • Remainder-of-state areas (e.g., “Rest of California”)

FAF5 uses approximately 132 CFS areas (the exact count varies by version). These areas are intermediate between individual counties (~3,100) and states (51), providing sufficient geographic resolution to identify regional value flow patterns while maintaining manageable matrix dimensions (~130×130).

CFS areas are not the same as Census metropolitan statistical areas (MSAs) or BEA economic areas. They are specific to the CFS/FAF geography.

CFS area codes in FAF5 are integer strings (e.g., "11", "119", "509"). The region-level CSV uses zero-padded 3-digit codes ("011", "119", "509"); the loader normalizes these by stripping leading zeros to produce bare integers.


File Inventory

FAF5 data is not included in the repository due to file size. Files must be downloaded manually from the BTS website (see How to Obtain the Data).

FAF5 Data Files in data/freight/faf/

File path

Contents

region/FAF5.7.1_2018-2024.csv

Primary source for FAFLoader. Zone-level (CFS Area) O-D flows, years 2018–2024. ~2.49M records. 600+ MB.

FAF5.7.1_State_2018-2024.csv

State-level aggregated flows (fallback if region/ absent). Lower resolution; not used when zone-level file is present.

FAF5_metadata.xlsx

Zone names, zone-to-state mapping, SCTG code descriptions, mode codes.

region/FAF5_metadata.xlsx

Metadata for the zone-level dataset.

truck_origin_factors.csv

County-level disaggregation factors for truck freight (origin).

truck_destination_factors.csv

County-level disaggregation factors for truck freight (destination).

rail_origin_factors.csv

County-level disaggregation factors for rail freight (origin).

rail_destination_factors.csv

County-level disaggregation factors for rail freight (destination).

water_origin_factors.csv

County-level disaggregation factors for water freight (origin).

water_destination_factors.csv

County-level disaggregation factors for water freight (destination).

pipeline_origin_factors.csv

County-level disaggregation factors for pipeline freight (origin).

pipeline_destination_factors.csv

County-level disaggregation factors for pipeline freight (destination).

county/01_Alabama.zip

County-level estimate downloads (partial; see download_county.sh).

download_county.sh

Script to bulk-download all 51 state county ZIP files.

File search priority in FAFLoader: The loader searches for FAF5 CSV in this order:

  1. data/freight/faf/FAF5.csv (exact name)

  2. data/freight/faf/region/FAF5*.csv (zone-level, preferred)

  3. data/freight/faf/FAF5*.csv (any FAF5 CSV at top level, fallback)

The zone-level file is preferred because it has higher geographic resolution (~130 CFS areas vs. 51 states).


CSV Column Schema

FAF5 CSV files use a long-wide format: each row is one (origin, destination, commodity, mode) combination, with separate columns for each year’s flow values.

FAF5 CSV Column Definitions

Column

Type

Description

dms_orig

string

Origin CFS Area code (integer string, may be zero-padded in region CSV). Normalized by FAFLoader to bare integer string.

dms_dest

string

Destination CFS Area code. Same normalization as origin.

sctg2

integer

SCTG 2-digit commodity code (1–43; code 40 not used by FAF5).

dms_mode

integer

Transport mode code (see mode codes below).

tons_YYYY

float

Estimated shipment weight for year YYYY, in thousands of short tons. Multiple year columns present (e.g., tons_2018, tons_2019, …, tons_2024).

value_YYYY

float

Estimated shipment value for year YYYY, in millions of USD.

tmiles_YYYY

float

Estimated ton-miles for year YYYY, in millions.

The loader extracts year-specific columns by pattern matching (value_{year}, tons_{year}, tmiles_{year}). If the target year has no corresponding column, loading returns 0 records with a warning.

Transport Mode Codes

FAF5 Transport Mode Codes

Code

Description

1

Truck (all truck types: single-unit, combination)

2

Rail (class I railroads and short lines)

3

Water (domestic waterways, Great Lakes, coastal)

4

Air (domestic air cargo)

5

Pipeline (crude oil, natural gas, refined products)

Mode codes are stored as strings in fact_faf_commodity_flow.mode_code ('1', '2', etc.).

SCTG Commodity Code Ranges

The Standard Classification of Transported Goods (SCTG) groups commodities into 2-digit codes 01–43 (code 40 unused by FAF5). The broad categories used in dim_sctg_commodity.category are:

  • agriculture: Farm products, fish, forest products (early codes)

  • mining: Minerals, coal, petroleum (middle codes)

  • manufacturing: Manufactured goods (later codes)

Specific SCTG-to-name mappings are available in the FAF5 metadata XLSX (FAF5_metadata.xlsx, SCTG2 worksheet).


FAFLoader API

class FAFLoader

Loader for BTS FAF5 commodity flow data into the 3NF SQLite schema.

Reads FAF5 CSV data and populates:

  • dim_cfs_area: CFS Area dimension records (on-demand)

  • dim_sctg_commodity: SCTG commodity records (on-demand)

  • fact_faf_commodity_flow: O-D flow records by year

__init__(config=None, data_dir=None)
Parameters:
  • configLoaderConfig for batch size and operational settings. Default: LoaderConfig() with batch_size=1000.

  • data_dir – Base data directory. Default: Path("data") relative to project working directory.

load(session, reset=True, verbose=True, **kwargs) LoadStats

Load FAF5 commodity flow data for a single year.

Parameters:
  • session – SQLAlchemy Session connected to the normalized DB.

  • reset – If True, delete existing records in all three tables before loading. Default: True.

  • verbose – If True, log progress. Default: True.

  • kwargs – Accepts year (int or str). Target year for extraction from the multi-year CSV. Default: 2022.

Returns:

LoadStats with fields:

  • facts_loaded["fact_faf_commodity_flow"]: rows inserted

  • files_processed: 1 if successful, 0 if CSV not found

  • errors: error strings (non-empty if CSV absent)

Raises:

Does not raise; missing CSV produces an error in LoadStats.

Table population details:

  • dim_cfs_area: Records created on-demand as CFS codes appear in CSV. If no metadata file is available, names default to "CFS Area {code}".

  • dim_sctg_commodity: Records created on-demand as SCTG codes appear. Names default to "SCTG {code}" without metadata.

  • fact_faf_commodity_flow: Batch-inserted in chunks of batch_size rows. Zero values stored as NULL (sparse).

In-memory caches: The loader maintains area_cache and sctg_cache dicts to avoid repeated database lookups during CSV processing. These are local to each load() call.

Data source registration: Creates or retrieves a dim_data_source record with source_code="BTS_FAF5" and source_agency="Bureau of Transportation Statistics".

Example:

from babylon.data.bts.faf_loader import FAFLoader
from babylon.data.reference.database import get_session_factory

loader = FAFLoader()
session_factory = get_session_factory()
with session_factory() as session:
    stats = loader.load(session, year=2022)
if stats.errors:
    print("FAF CSV not found:", stats.errors[0])
else:
    print(f"Loaded {stats.facts_loaded['fact_faf_commodity_flow']} flows")

FAFCSVParser API

class FAFCSVParser

Internal CSV parser. Used by FAFLoader; can be used independently.

extract_year_columns(headers, year) dict[str, str]

Find year-specific column names in the CSV header.

Parameters:
  • headers – List of CSV header strings.

  • year – Target year integer.

Returns:

Dict with keys 'value', 'tons', 'tmiles' mapping to actual column names. Keys absent if column not found in headers.

parse_row(row, year_cols) tuple | None

Parse a single CSV row into a flow record.

Parameters:
  • row – Dict of column name → value string (from csv.DictReader).

  • year_cols – Year column mapping from extract_year_columns.

Returns:

Tuple (origin, dest, sctg2, mode, tons, value, tmiles) or None if required fields are missing or invalid.

CFS code normalization: Origin and destination codes are parsed as integers and converted back to strings, stripping leading zeros: "011""11". Invalid codes (non-numeric) return None.

How to Obtain the Data

FAF5 data must be downloaded manually from BTS:

  1. Go to https://www.bts.gov/faf

  2. Under “Data Download”, select “FAF5 Data” → “Origin-Destination Data”

  3. Download the zone-level CSV (FAF5.7.1_2018-2024.csv or equivalent)

  4. Place in data/freight/faf/region/

  5. Run FAFLoader.load(session, year=2022)

For county-level disaggregation factors (mode-specific):

cd data/freight/faf
bash download_county.sh   # downloads all 51 state ZIP files to county/

The metadata XLSX (already present at data/freight/faf/FAF5_metadata.xlsx) contains zone names and zone-to-state mapping.