BTS FAF5 Freight Analysis Data
Reference for the Bureau of Transportation Statistics (BTS) Freight Analysis
Framework (FAF5) data files, including CFS area geography, file inventory,
CSV column schema, and the FAFLoader API.
For the theoretical background, see The Imperial Rent Field: Spatial Value Extraction. For the schema tables these populate, see Tensor Hierarchy Database Schema.
What CFS Areas Are
The Census Bureau’s Commodity Flow Survey (CFS) aggregates US counties into Commodity Flow Survey (CFS) Areas—approximately 130 geographic zones designed for freight flow analysis. CFS areas represent:
Single large counties (e.g., Los Angeles County alone is one CFS area)
Multi-county metropolitan regions (e.g., Boston MA-NH CFS Area)
Remainder-of-state areas (e.g., “Rest of California”)
FAF5 uses approximately 132 CFS areas (the exact count varies by version). These areas are intermediate between individual counties (~3,100) and states (51), providing sufficient geographic resolution to identify regional value flow patterns while maintaining manageable matrix dimensions (~130×130).
CFS areas are not the same as Census metropolitan statistical areas (MSAs) or BEA economic areas. They are specific to the CFS/FAF geography.
CFS area codes in FAF5 are integer strings (e.g., "11", "119",
"509"). The region-level CSV uses zero-padded 3-digit codes ("011",
"119", "509"); the loader normalizes these by stripping leading zeros
to produce bare integers.
File Inventory
FAF5 data is not included in the repository due to file size. Files must be downloaded manually from the BTS website (see How to Obtain the Data).
File path |
Contents |
|---|---|
|
Primary source for FAFLoader. Zone-level (CFS Area) O-D flows, years 2018–2024. ~2.49M records. 600+ MB. |
|
State-level aggregated flows (fallback if region/ absent). Lower resolution; not used when zone-level file is present. |
|
Zone names, zone-to-state mapping, SCTG code descriptions, mode codes. |
|
Metadata for the zone-level dataset. |
|
County-level disaggregation factors for truck freight (origin). |
|
County-level disaggregation factors for truck freight (destination). |
|
County-level disaggregation factors for rail freight (origin). |
|
County-level disaggregation factors for rail freight (destination). |
|
County-level disaggregation factors for water freight (origin). |
|
County-level disaggregation factors for water freight (destination). |
|
County-level disaggregation factors for pipeline freight (origin). |
|
County-level disaggregation factors for pipeline freight (destination). |
|
County-level estimate downloads (partial; see |
|
Script to bulk-download all 51 state county ZIP files. |
File search priority in FAFLoader: The loader searches for FAF5 CSV in this order:
data/freight/faf/FAF5.csv(exact name)data/freight/faf/region/FAF5*.csv(zone-level, preferred)data/freight/faf/FAF5*.csv(any FAF5 CSV at top level, fallback)
The zone-level file is preferred because it has higher geographic resolution (~130 CFS areas vs. 51 states).
CSV Column Schema
FAF5 CSV files use a long-wide format: each row is one (origin, destination, commodity, mode) combination, with separate columns for each year’s flow values.
Column |
Type |
Description |
|---|---|---|
|
string |
Origin CFS Area code (integer string, may be zero-padded in region CSV). Normalized by FAFLoader to bare integer string. |
|
string |
Destination CFS Area code. Same normalization as origin. |
|
integer |
SCTG 2-digit commodity code (1–43; code 40 not used by FAF5). |
|
integer |
Transport mode code (see mode codes below). |
|
float |
Estimated shipment weight for year YYYY, in thousands of short tons.
Multiple year columns present (e.g., |
|
float |
Estimated shipment value for year YYYY, in millions of USD. |
|
float |
Estimated ton-miles for year YYYY, in millions. |
The loader extracts year-specific columns by pattern matching
(value_{year}, tons_{year}, tmiles_{year}). If the target year
has no corresponding column, loading returns 0 records with a warning.
Transport Mode Codes
Code |
Description |
|---|---|
|
Truck (all truck types: single-unit, combination) |
|
Rail (class I railroads and short lines) |
|
Water (domestic waterways, Great Lakes, coastal) |
|
Air (domestic air cargo) |
|
Pipeline (crude oil, natural gas, refined products) |
Mode codes are stored as strings in fact_faf_commodity_flow.mode_code
('1', '2', etc.).
SCTG Commodity Code Ranges
The Standard Classification of Transported Goods (SCTG) groups commodities
into 2-digit codes 01–43 (code 40 unused by FAF5). The broad categories used
in dim_sctg_commodity.category are:
agriculture: Farm products, fish, forest products (early codes)
mining: Minerals, coal, petroleum (middle codes)
manufacturing: Manufactured goods (later codes)
Specific SCTG-to-name mappings are available in the FAF5 metadata XLSX
(FAF5_metadata.xlsx, SCTG2 worksheet).
FAFLoader API
- class FAFLoader
Loader for BTS FAF5 commodity flow data into the 3NF SQLite schema.
Reads FAF5 CSV data and populates:
dim_cfs_area: CFS Area dimension records (on-demand)dim_sctg_commodity: SCTG commodity records (on-demand)fact_faf_commodity_flow: O-D flow records by year
- __init__(config=None, data_dir=None)
- Parameters:
config –
LoaderConfigfor batch size and operational settings. Default:LoaderConfig()withbatch_size=1000.data_dir – Base data directory. Default:
Path("data")relative to project working directory.
- load(session, reset=True, verbose=True, **kwargs) LoadStats
Load FAF5 commodity flow data for a single year.
- Parameters:
session – SQLAlchemy
Sessionconnected to the normalized DB.reset – If
True, delete existing records in all three tables before loading. Default:True.verbose – If
True, log progress. Default:True.kwargs – Accepts
year(intorstr). Target year for extraction from the multi-year CSV. Default:2022.
- Returns:
LoadStatswith fields:facts_loaded["fact_faf_commodity_flow"]: rows insertedfiles_processed: 1 if successful, 0 if CSV not founderrors: error strings (non-empty if CSV absent)
- Raises:
Does not raise; missing CSV produces an error in
LoadStats.
Table population details:
dim_cfs_area: Records created on-demand as CFS codes appear in CSV. If no metadata file is available, names default to"CFS Area {code}".dim_sctg_commodity: Records created on-demand as SCTG codes appear. Names default to"SCTG {code}"without metadata.fact_faf_commodity_flow: Batch-inserted in chunks ofbatch_sizerows. Zero values stored asNULL(sparse).
In-memory caches: The loader maintains
area_cacheandsctg_cachedicts to avoid repeated database lookups during CSV processing. These are local to eachload()call.Data source registration: Creates or retrieves a
dim_data_sourcerecord withsource_code="BTS_FAF5"andsource_agency="Bureau of Transportation Statistics".Example:
from babylon.data.bts.faf_loader import FAFLoader from babylon.data.reference.database import get_session_factory loader = FAFLoader() session_factory = get_session_factory() with session_factory() as session: stats = loader.load(session, year=2022) if stats.errors: print("FAF CSV not found:", stats.errors[0]) else: print(f"Loaded {stats.facts_loaded['fact_faf_commodity_flow']} flows")
FAFCSVParser API
- class FAFCSVParser
Internal CSV parser. Used by
FAFLoader; can be used independently.- extract_year_columns(headers, year) dict[str, str]
Find year-specific column names in the CSV header.
- Parameters:
headers – List of CSV header strings.
year – Target year integer.
- Returns:
Dict with keys
'value','tons','tmiles'mapping to actual column names. Keys absent if column not found in headers.
- parse_row(row, year_cols) tuple | None
Parse a single CSV row into a flow record.
- Parameters:
row – Dict of column name → value string (from
csv.DictReader).year_cols – Year column mapping from
extract_year_columns.
- Returns:
Tuple
(origin, dest, sctg2, mode, tons, value, tmiles)orNoneif required fields are missing or invalid.
CFS code normalization: Origin and destination codes are parsed as integers and converted back to strings, stripping leading zeros:
"011"→"11". Invalid codes (non-numeric) returnNone.
How to Obtain the Data
FAF5 data must be downloaded manually from BTS:
Go to https://www.bts.gov/faf
Under “Data Download”, select “FAF5 Data” → “Origin-Destination Data”
Download the zone-level CSV (
FAF5.7.1_2018-2024.csvor equivalent)Place in
data/freight/faf/region/Run
FAFLoader.load(session, year=2022)
For county-level disaggregation factors (mode-specific):
cd data/freight/faf
bash download_county.sh # downloads all 51 state ZIP files to county/
The metadata XLSX (already present at data/freight/faf/FAF5_metadata.xlsx)
contains zone names and zone-to-state mapping.