QCEW Employment Data
====================

The QCEW (Quarterly Census of Employment and Wages) module provides access to
employment and wage data from the Bureau of Labor Statistics for Babylon's
simulation systems.

Overview
--------

QCEW is integrated into Babylon to provide:

- **County-level employment**: Detailed establishment and worker counts by NAICS industry
- **State-level aggregates**: Broader geographic patterns for macro analysis
- **Metro area data**: MSA, Micropolitan, and CSA aggregates for urban economic analysis
- **Location quotients**: Industry concentration metrics for comparative analysis
- **Class composition**: NAICS industry mapping to Marxian class categories

All data is stored in ``data/sqlite/marxist-data-3NF.sqlite`` using a normalized
star schema.

Setup
-----

**No setup required for API loading.** QCEW uses a hybrid loading strategy:

- **API (2021-2025)**: Fetches directly from BLS QCEW Open Data API
- **Files (2013-2020)**: Reads from local CSV files (optional, for historical data)

For file-based loading of historical years, download CSV files from:
https://www.bls.gov/cew/downloadable-data-files.htm

Loading Data
------------

Command Line
^^^^^^^^^^^^

.. code-block:: bash

    # Default hybrid loading (API for 2021+, files for historical)
    mise run data:qcew

    # Specific year range
    mise run data:qcew -- --years 2020-2023

    # Force API for all years (may fail for old years not in API)
    mise run data:qcew -- --force-api

    # Force file-based loading (requires downloaded CSVs)
    mise run data:qcew -- --force-files --data-path data/qcew

    # Reset tables before loading
    mise run data:qcew -- --reset

Python API
^^^^^^^^^^

.. code-block:: python

    from babylon.data.qcew import QcewLoader, QcewAPIClient
    from babylon.data.loader_base import LoaderConfig
    from babylon.data.reference.database import get_normalized_session

    # Hybrid loading (default)
    config = LoaderConfig(qcew_years=list(range(2013, 2026)))
    loader = QcewLoader(config)

    with get_normalized_session() as session:
        stats = loader.load(session, reset=True)
        print(f"County: {stats.facts_loaded.get('qcew_county', 0):,}")
        print(f"State: {stats.facts_loaded.get('qcew_state', 0):,}")
        print(f"Metro: {stats.facts_loaded.get('qcew_metro', 0):,}")

    # Direct API access
    with QcewAPIClient() as client:
        # Fetch California state-level data for 2023
        for record in client.get_area_annual_data(2023, "06000"):
            print(f"{record.industry_code}: {record.annual_avg_emplvl:,} workers")

Geographic Levels
-----------------

QCEW data is loaded at three geographic levels based on aggregation level codes:

.. list-table::
   :header-rows: 1
   :widths: 20 20 30 30

   * - Level
     - agglvl_code
     - Fact Table
     - Description
   * - County
     - 70-78
     - ``fact_qcew_annual``
     - 3,200+ US counties
   * - State
     - 20-28
     - ``fact_qcew_state_annual``
     - 52 states/territories
   * - MSA
     - 30-38
     - ``fact_qcew_metro_annual``
     - Metropolitan Statistical Areas
   * - Micropolitan
     - 40-48
     - ``fact_qcew_metro_annual``
     - Micropolitan Statistical Areas
   * - CSA
     - 50-58
     - ``fact_qcew_metro_annual``
     - Combined Statistical Areas

Database Schema
---------------

Fact Tables
^^^^^^^^^^^

**fact_qcew_annual** (County Level)

.. list-table::
   :header-rows: 1
   :widths: 25 25 50

   * - Column
     - Type
     - Description
   * - county_id
     - FK
     - Reference to dim_county
   * - industry_id
     - FK
     - Reference to dim_industry
   * - ownership_id
     - FK
     - Reference to dim_ownership
   * - time_id
     - FK
     - Reference to dim_time
   * - establishments
     - INT
     - Number of business establishments
   * - employment
     - INT
     - Annual average employment
   * - total_wages_usd
     - DECIMAL
     - Total annual wages paid
   * - avg_weekly_wage_usd
     - INT
     - Average weekly wage per worker
   * - avg_annual_pay_usd
     - INT
     - Average annual pay per worker
   * - lq_employment
     - DECIMAL
     - Location quotient for employment
   * - lq_annual_pay
     - DECIMAL
     - Location quotient for pay
   * - disclosure_code
     - VARCHAR
     - Data suppression indicator

**fact_qcew_state_annual** and **fact_qcew_metro_annual** follow the same
structure, with FK references to ``dim_state`` and ``dim_metro_area`` respectively.

NAICS Hierarchy Levels
^^^^^^^^^^^^^^^^^^^^^^

The ``dim_industry`` table includes a ``naics_level`` column indicating
each code's position in the NAICS hierarchy:

.. list-table::
   :header-rows: 1
   :widths: 15 40 45

   * - naics_level
     - Meaning
     - Example
   * - 0
     - Grand total (all industries)
     - ``10``
   * - 2
     - Sector (includes compound codes)
     - ``31-33``, ``44-45``, ``48-49``
   * - 3
     - Subsector
     - ``336``
   * - 4
     - Industry group
     - ``3361``
   * - 5
     - NAICS industry
     - ``33611``
   * - 6
     - National industry (leaf)
     - ``336111``
   * - 98
     - BLS ownership variants
     - (special codes)
   * - 99
     - BLS supersectors
     - ``1011``--``1029``

.. warning::

   QCEW reports wages at **every** hierarchy level. Parent-level wages
   **include all children**. Summing across levels without filtering
   produces catastrophic double-counting (empirically 10.4x in Wayne
   County). Always filter to ``naics_level = 6`` when aggregating
   wages. See :doc:`/concepts/naics-hierarchy` for full explanation.

National Wages Cache Table
^^^^^^^^^^^^^^^^^^^^^^^^^^

``InterpolatingBEASource`` maintains a materialized cache table
``_cache_national_wages_bea`` to avoid repeated aggregation of the
43M-row ``fact_qcew_annual`` table. The cache stores pre-aggregated
national wages per BEA industry per year.

.. list-table::
   :header-rows: 1
   :widths: 25 25 50

   * - Column
     - Type
     - Description
   * - bea_industry_id
     - INT
     - FK to ``dim_bea_industry``
   * - year
     - INT
     - Data year
   * - national_wages_usd
     - REAL
     - Sum of ``total_wages_usd`` for all leaf NAICS in this BEA industry
   * - cache_version
     - INT
     - Schema version (current: 2). Stale caches are auto-rebuilt.

The cache is built on first use and persists across sessions. When the
``cache_version`` column is missing or does not match the current version,
the table is dropped and rebuilt with leaf-only NAICS filtering.

SQL Queries
-----------

**Employment by Class Composition**

.. code-block:: sql

    SELECT
        sec.class_composition,
        SUM(q.employment) as total_employment,
        ROUND(SUM(q.total_wages_usd) / 1e9, 2) as wages_billions
    FROM fact_qcew_annual q
    JOIN dim_industry i ON q.industry_id = i.industry_id
    JOIN dim_sector sec ON i.sector_code = sec.sector_code
    JOIN dim_time t ON q.time_id = t.time_id
    WHERE t.year = 2023 AND sec.class_composition IS NOT NULL
    GROUP BY sec.class_composition
    ORDER BY total_employment DESC;

**State-Level Industry Concentration**

.. code-block:: sql

    SELECT
        s.state_name,
        i.naics_title,
        q.lq_employment as concentration,
        q.employment
    FROM fact_qcew_state_annual q
    JOIN dim_state s ON q.state_id = s.state_id
    JOIN dim_industry i ON q.industry_id = i.industry_id
    JOIN dim_time t ON q.time_id = t.time_id
    WHERE t.year = 2023
      AND q.lq_employment > 2.0  -- Highly concentrated
    ORDER BY q.lq_employment DESC
    LIMIT 20;

**Metro Area Manufacturing Employment**

.. code-block:: sql

    SELECT
        m.metro_name,
        q.area_type,
        SUM(q.employment) as manufacturing_jobs,
        ROUND(AVG(q.avg_annual_pay_usd), 0) as avg_pay
    FROM fact_qcew_metro_annual q
    JOIN dim_metro_area m ON q.metro_area_id = m.metro_area_id
    JOIN dim_industry i ON q.industry_id = i.industry_id
    JOIN dim_time t ON q.time_id = t.time_id
    WHERE i.naics_code LIKE '31%'
       OR i.naics_code LIKE '32%'
       OR i.naics_code LIKE '33%'
    AND t.year = 2023
    GROUP BY m.metro_name, q.area_type
    ORDER BY manufacturing_jobs DESC
    LIMIT 20;

API Client
----------

The ``QcewAPIClient`` class provides direct access to the BLS QCEW Open Data API:

.. code-block:: python

    from babylon.data.qcew import QcewAPIClient, QcewAPIError

    with QcewAPIClient() as client:
        try:
            # Fetch by area (state/county FIPS code)
            records = list(client.get_area_annual_data(2023, "01001"))  # Autauga County, AL

            # Fetch by industry (NAICS code)
            records = list(client.get_industry_annual_data(2023, "31-33"))  # Manufacturing

        except QcewAPIError as e:
            print(f"API error {e.status_code}: {e.message}")

API Features:

- **Rate limiting**: 0.5s delay between requests (polite to BLS servers)
- **Retry logic**: Exponential backoff on 429/5xx errors
- **CSV parsing**: Automatic parsing of BLS CSV responses
- **Error handling**: Distinguishes 404 (missing data) from server errors

Marxian Analysis
----------------

QCEW data supports Marxian class analysis through:

**Class Composition by Sector**

Industries are mapped to class composition categories in ``dim_sector``:

- ``goods_producing`` - Manufacturing, construction (productive labor)
- ``service_producing`` - Services, retail (mixed productive/unproductive)
- ``circulation`` - Finance, real estate (unproductive, surplus redistribution)
- ``government`` - Public administration (social reproduction)
- ``extraction`` - Mining, resources (ground rent extraction)

**Location Quotients for Labor Market Analysis**

Location quotients (LQ) reveal industry concentration:

- LQ > 1.25: Industry concentrated (potential labor aristocracy formation)
- LQ < 0.75: Industry underrepresented (labor mobility constraints)

**Wage Differentials**

Cross-geographic wage comparisons reveal unequal exchange dynamics:

- Core metro areas: Higher wages, higher productivity
- Periphery counties: Lower wages, extraction patterns

See Also
--------

- :doc:`/concepts/naics-hierarchy` - Why NAICS hierarchy causes double-counting
  and the leaf-only filtering solution
- :doc:`fred-data` - FRED macroeconomic data
- :doc:`census-analysis` - Census ACS demographics
- :doc:`bea-department-mapping` - BEA industry to Marxian department mapping
- :mod:`babylon.data.qcew` - API reference
- `BLS QCEW Open Data <https://www.bls.gov/cew/additional-resources/open-data/>`_