2025 Medical Expenditure Panel Survey, Household Component (MEPS-HC)
Reproducing the 2025 MEPS Workshop in Python (svy)
In this document, we use Python and the svy library to reproduce the 2025 MEPS Workshop (originally conducted in R, see GitHub Repository).
To follow along and run the code locally, download the following 2023 MEPS public-use files:
- 2023 Full-Year Consolidated (HC-251)
- 2023 Office-Based Medical Provider Visits (HC-248G)
- 2023 Medical Conditions (HC-249)
- 2023 CLNK: Condition–Event Link (HC-248I)
uv
The steps below give you a fast, reproducible setup for running the MEPS workshop with svy.
- If you do not have
uv, install it. See instructions at https://docs.astral.sh/uv/getting-started/installation/ - restart your shell so
uvis on PATH, and from your root project run:uv venv -p 3.13 - Initialize the environment with
uv init - Add the requirements:
uv add svy[report]
Since it’s a simple analysis, you can store the datasets and code in the root project folder.
Imports used throughout the workshop and some general settings.
Part I — Estimates for National Health Care for the U.S. Civilian Non-Institutionalized Population, 2023
Exploration of the Relevant Data from the 2023 MEPS-HC
First, we read the 2023 Full-Year Consolidated file from local storage using svy.read_stata. Then we subset the variables needed for this tutorial, derive two helper variables, and run quick QC checks.
Read the Stata file via svy
shape: (18_919, 1_374)
┌────────────┬───────┬────────────┬───────┬───┬──────────────┬──────────────┬────────┬────────┐
│ DUID ┆ PID ┆ DUPERSID ┆ PANEL ┆ … ┆ FAMWT23C ┆ SAQWT23F ┆ VARSTR ┆ VARPSU │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞════════════╪═══════╪════════════╪═══════╪═══╪══════════════╪══════════════╪════════╪════════╡
│ 2.790002e6 ┆ 101.0 ┆ 2790002101 ┆ 27.0 ┆ … ┆ 11158.817826 ┆ 13221.315673 ┆ 2019.0 ┆ 1.0 │
│ 2.790002e6 ┆ 102.0 ┆ 2790002102 ┆ 27.0 ┆ … ┆ 11158.817826 ┆ 0.0 ┆ 2019.0 ┆ 1.0 │
│ 2.790004e6 ┆ 101.0 ┆ 2790004101 ┆ 27.0 ┆ … ┆ 28540.745942 ┆ 29999.277476 ┆ 2084.0 ┆ 1.0 │
│ 2.790006e6 ┆ 101.0 ┆ 2790006101 ┆ 27.0 ┆ … ┆ 10821.040689 ┆ 11144.513916 ┆ 2113.0 ┆ 1.0 │
│ 2.790006e6 ┆ 102.0 ┆ 2790006102 ┆ 27.0 ┆ … ┆ 10821.040689 ┆ 0.0 ┆ 2113.0 ┆ 1.0 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 2.819784e6 ┆ 105.0 ┆ 2819784105 ┆ 28.0 ┆ … ┆ 5126.030033 ┆ 0.0 ┆ 2015.0 ┆ 1.0 │
│ 2.819788e6 ┆ 101.0 ┆ 2819788101 ┆ 28.0 ┆ … ┆ 3608.933864 ┆ 4902.751776 ┆ 2005.0 ┆ 1.0 │
│ 2.819792e6 ┆ 101.0 ┆ 2819792101 ┆ 28.0 ┆ … ┆ 26238.599825 ┆ 0.0 ┆ 2012.0 ┆ 3.0 │
│ 2.819793e6 ┆ 101.0 ┆ 2819793101 ┆ 28.0 ┆ … ┆ 15780.213332 ┆ 0.0 ┆ 2004.0 ┆ 1.0 │
│ 2.819793e6 ┆ 102.0 ┆ 2819793102 ┆ 28.0 ┆ … ┆ 15780.213332 ┆ 0.0 ┆ 2004.0 ┆ 1.0 │
└────────────┴───────┴────────────┴───────┴───┴──────────────┴──────────────┴────────┴────────┘
Subset to columns used in the tutorial using polars.
shape: (5, 6)
┌─────────┬──────────┬────────────┬────────┬────────┬──────────────┐
│ AGELAST ┆ TOTEXP23 ┆ DUPERSID ┆ VARSTR ┆ VARPSU ┆ PERWT23F │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪══════════╪════════════╪════════╪════════╪══════════════╡
│ 58.0 ┆ 646.0 ┆ 2790002101 ┆ 2019.0 ┆ 1.0 ┆ 11664.426815 │
│ 27.0 ┆ 1894.0 ┆ 2790002102 ┆ 2019.0 ┆ 1.0 ┆ 32212.113596 │
│ 49.0 ┆ 986.0 ┆ 2790004101 ┆ 2084.0 ┆ 1.0 ┆ 21944.142826 │
│ 75.0 ┆ 1312.0 ┆ 2790006101 ┆ 2113.0 ┆ 1.0 ┆ 10328.00953 │
│ 23.0 ┆ 0.0 ┆ 2790006102 ┆ 2113.0 ┆ 1.0 ┆ 17430.521357 │
└─────────┴──────────┴────────────┴────────┴────────┴──────────────┘
Derive helper variables:
has_exp: indicator for any total expenditureage_cat: <65 vs 65+
shape: (5, 8)
┌─────────┬──────────┬────────────┬────────┬────────┬──────────────┬─────────┬─────────┐
│ AGELAST ┆ TOTEXP23 ┆ DUPERSID ┆ VARSTR ┆ VARPSU ┆ PERWT23F ┆ has_exp ┆ age_cat │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 ┆ f64 ┆ bool ┆ str │
╞═════════╪══════════╪════════════╪════════╪════════╪══════════════╪═════════╪═════════╡
│ 58.0 ┆ 646.0 ┆ 2790002101 ┆ 2019.0 ┆ 1.0 ┆ 11664.426815 ┆ true ┆ <65 │
│ 27.0 ┆ 1894.0 ┆ 2790002102 ┆ 2019.0 ┆ 1.0 ┆ 32212.113596 ┆ true ┆ <65 │
│ 49.0 ┆ 986.0 ┆ 2790004101 ┆ 2084.0 ┆ 1.0 ┆ 21944.142826 ┆ true ┆ <65 │
│ 75.0 ┆ 1312.0 ┆ 2790006101 ┆ 2113.0 ┆ 1.0 ┆ 10328.00953 ┆ true ┆ 65+ │
│ 23.0 ┆ 0.0 ┆ 2790006102 ┆ 2113.0 ┆ 1.0 ┆ 17430.521357 ┆ false ┆ <65 │
└─────────┴──────────┴────────────┴────────┴────────┴──────────────┴─────────┴─────────┘
QC 1: Two-way counts of derived variables
shape: (4, 3)
┌─────────┬─────────┬───────┐
│ has_exp ┆ age_cat ┆ len │
│ --- ┆ --- ┆ --- │
│ bool ┆ str ┆ u32 │
╞═════════╪═════════╪═══════╡
│ false ┆ <65 ┆ 2497 │
│ false ┆ 65+ ┆ 158 │
│ true ┆ <65 ┆ 11778 │
│ true ┆ 65+ ┆ 4486 │
└─────────┴─────────┴───────┘
QC 2: Expenditure ranges by has_exp
shape: (2, 3)
┌─────────┬─────┬──────────┐
│ has_exp ┆ min ┆ max │
│ --- ┆ --- ┆ --- │
│ bool ┆ f64 ┆ f64 │
╞═════════╪═════╪══════════╡
│ false ┆ 0.0 ┆ 0.0 │
│ true ┆ 1.0 ┆ 574675.0 │
└─────────┴─────┴──────────┘
QC 3: Age ranges by age_cat
shape: (2, 3)
┌─────────┬──────┬──────┐
│ age_cat ┆ min ┆ max │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═════════╪══════╪══════╡
│ <65 ┆ 0.0 ┆ 64.0 │
│ 65+ ┆ 65.0 ┆ 85.0 │
└─────────┴──────┴──────┘
Estimation of Expenses
Sample design
First, we are going to define the sample
╭─────────────────────────── Sample ────────────────────────────╮ │ Survey Data: │ │ Number of rows: 18919 │ │ Number of columns: 11 │ │ Number of strata: 105 │ │ Number of PSUs: 262 │ │ │ │ Survey Design: │ │ │ │ Field Value │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ Row index svy_row_index │ │ Stratum VARSTR │ │ PSU VARPSU │ │ SSU None │ │ Weight PERWT23F │ │ With replacement False │ │ Prob None │ │ Hit None │ │ MOS None │ │ Population size None │ │ Replicate weights None │ │ │ ╰───────────────────────────────────────────────────────────────╯
Checking for singletons
If some strata only have one PSU, the estimation will fail. We can check for singletons (strata with one PSU) as follows
The sample does not have singletons.
Overall expenses (national totals)
╭────────────────────────────── Estimate: TOTAL (TAYLOR) ───────────────────────────────╮ │ │ │ est se lci uci cv (%) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ 2,504,715,663,9… 4,769,269,754,8… -6,915,494,271… 11,924,925,599,… 190.4116 │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────────╯
Percentage of persons with an expense
╭──────────────── Estimate: PROP (TAYLOR) ────────────────╮ │ │ │ has_exp est se lci uci cv (%) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ false 0.1445 0.0764 0.0475 0.3641 52.8735 │ │ true 0.8555 0.0764 0.6359 0.9525 8.9301 │ │ │ ╰─────────────────────────────────────────────────────────╯
Mean expense per person
╭─────────────────── Estimate: MEAN (TAYLOR) ────────────────────╮ │ │ │ est se lci uci cv (%) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ 7,487.2616 1,620.5194 4,286.4292 10,688.0940 21.6437 │ │ │ ╰────────────────────────────────────────────────────────────────╯
Mean expense per person, for people with expenditures)
╭─────────────────── Estimate: MEAN (TAYLOR) ────────────────────╮ │ │ │ est se lci uci cv (%) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ 7,487.2616 1,620.5194 4,286.4292 10,688.0940 21.6437 │ │ │ ╰────────────────────────────────────────────────────────────────╯
Subset to people with income
╭─────────────────────────── Sample ────────────────────────────╮ │ Survey Data: │ │ Number of rows: 16264 │ │ Number of columns: 11 │ │ Number of strata: 105 │ │ Number of PSUs: 262 │ │ │ │ Survey Design: │ │ │ │ Field Value │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ Row index svy_row_index │ │ Stratum VARSTR │ │ PSU VARPSU │ │ SSU None │ │ Weight PERWT23F │ │ With replacement False │ │ Prob None │ │ Hit None │ │ MOS None │ │ Population size None │ │ Replicate weights None │ │ │ ╰───────────────────────────────────────────────────────────────╯
Mean expense per person with an expense
╭─────────────────── Estimate: MEAN (TAYLOR) ────────────────────╮ │ │ │ est se lci uci cv (%) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ 8,751.8209 1,130.1951 6,519.4719 10,984.1700 12.9138 │ │ │ ╰────────────────────────────────────────────────────────────────╯
Mean expense per person with an expense, by age category
╭───────────────────────── Estimate: MEAN (TAYLOR) ─────────────────────────╮ │ │ │ age_cat est se lci uci cv (%) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ 65+ 16,000.1324 4,706.0289 6,704.8351 25,295.4298 29.4124 │ │ <65 6,853.3919 818.1920 5,237.3079 8,469.4759 11.9385 │ │ │ ╰───────────────────────────────────────────────────────────────────────────╯
Median expense per person with an expense, by age category
Part II - Link the MEPS-HC Medical Conditions File to the Office-Based Medical Visits File for Estimation
Merging Relevant Files
Load Stata data files using svy.read_dta
Keep only needed variables
Prepare data for estimation
shape: (229, 5)
┌────────┬────────┬────────┬────────┬─────┐
│ CCSR1X ┆ CCSR2X ┆ CCSR3X ┆ CCSR4X ┆ len │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ u32 │
╞════════╪════════╪════════╪════════╪═════╡
│ -15 ┆ -1 ┆ -1 ┆ -1 ┆ 396 │
│ BLD000 ┆ -1 ┆ -1 ┆ -1 ┆ 127 │
│ BLD000 ┆ CIR000 ┆ -1 ┆ -1 ┆ 1 │
│ BLD000 ┆ NEO000 ┆ -1 ┆ -1 ┆ 5 │
│ BLD001 ┆ -1 ┆ -1 ┆ -1 ┆ 71 │
│ … ┆ … ┆ … ┆ … ┆ … │
│ SYM013 ┆ -1 ┆ -1 ┆ -1 ┆ 561 │
│ SYM014 ┆ -1 ┆ -1 ┆ -1 ┆ 313 │
│ SYM015 ┆ -1 ┆ -1 ┆ -1 ┆ 223 │
│ SYM016 ┆ -1 ┆ -1 ┆ -1 ┆ 681 │
│ SYM017 ┆ -1 ┆ -1 ┆ -1 ┆ 460 │
└────────┴────────┴────────┴────────┴─────┘
shape: (16, 6)
┌──────────┬────────┬────────┬────────┬────────┬─────┐
│ ICD10CDX ┆ CCSR1X ┆ CCSR2X ┆ CCSR3X ┆ CCSR4X ┆ len │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ u32 │
╞══════════╪════════╪════════╪════════╪════════╪═════╡
│ -15 ┆ BLD000 ┆ NEO000 ┆ -1 ┆ -1 ┆ 5 │
│ -15 ┆ NEO000 ┆ -1 ┆ -1 ┆ -1 ┆ 273 │
│ C18 ┆ NEO015 ┆ -1 ┆ -1 ┆ -1 ┆ 44 │
│ C34 ┆ NEO022 ┆ -1 ┆ -1 ┆ -1 ┆ 42 │
│ C43 ┆ NEO025 ┆ -1 ┆ -1 ┆ -1 ┆ 119 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ C73 ┆ NEO050 ┆ -1 ┆ -1 ┆ -1 ┆ 49 │
│ C95 ┆ NEO064 ┆ -1 ┆ -1 ┆ -1 ┆ 46 │
│ D04 ┆ NEO028 ┆ -1 ┆ -1 ┆ -1 ┆ 97 │
│ D48 ┆ NEO072 ┆ -1 ┆ -1 ┆ -1 ┆ 58 │
│ D49 ┆ NEO072 ┆ -1 ┆ -1 ┆ -1 ┆ 78 │
└──────────┴────────┴────────┴────────┴────────┴─────┘
Many-to-many joins via CLNK crosswalk, then OB events
QC: Check EVENTYPE distribution
shape: (6, 2)
┌──────────┬────────┐
│ EVENTYPE ┆ n │
│ --- ┆ --- │
│ f64 ┆ u32 │
╞══════════╪════════╡
│ 1.0 ┆ 145548 │
│ 2.0 ┆ 22177 │
│ 3.0 ┆ 4294 │
│ 4.0 ┆ 2153 │
│ 7.0 ┆ 9045 │
│ 8.0 ┆ 97941 │
└──────────┴────────┘
shape: (1, 2)
┌──────────┬──────┐
│ EVENTYPE ┆ n │
│ --- ┆ --- │
│ f64 ┆ u32 │
╞══════════╪══════╡
│ 1.0 ┆ 4362 │
└──────────┴──────┘
Example: Same event treating multiple cancers for same person
shape: (3, 13)
┌────────────┬───────────────┬──────────┬────────┬───┬──────────┬───────┬─────────┬──────────┐
│ DUPERSID ┆ CONDIDX ┆ ICD10CDX ┆ CCSR1X ┆ … ┆ EVENTYPE ┆ PANEL ┆ OBXP23X ┆ ob_visit │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i32 │
╞════════════╪═══════════════╪══════════╪════════╪═══╪══════════╪═══════╪═════════╪══════════╡
│ 2790405102 ┆ 2790405102007 ┆ C61 ┆ NEO039 ┆ … ┆ 1.0 ┆ 27.0 ┆ 184.59 ┆ 1 │
│ 2790405102 ┆ 2790405102005 ┆ D04 ┆ NEO028 ┆ … ┆ 1.0 ┆ 27.0 ┆ 715.29 ┆ 1 │
│ 2790405102 ┆ 2790405102006 ┆ D48 ┆ NEO072 ┆ … ┆ 1.0 ┆ 27.0 ┆ 715.29 ┆ 1 │
└────────────┴───────────────┴──────────┴────────┴───┴──────────┴───────┴─────────┴──────────┘
De-Duplicate by EVNTIDX Per Person
Check the example person after de-duplication
shape: (2, 13)
┌────────────┬───────────────┬──────────┬────────┬───┬──────────┬───────┬─────────┬──────────┐
│ DUPERSID ┆ CONDIDX ┆ ICD10CDX ┆ CCSR1X ┆ … ┆ EVENTYPE ┆ PANEL ┆ OBXP23X ┆ ob_visit │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i32 │
╞════════════╪═══════════════╪══════════╪════════╪═══╪══════════╪═══════╪═════════╪══════════╡
│ 2790405102 ┆ 2790405102007 ┆ C61 ┆ NEO039 ┆ … ┆ 1.0 ┆ 27.0 ┆ 184.59 ┆ 1 │
│ 2790405102 ┆ 2790405102005 ┆ D04 ┆ NEO028 ┆ … ┆ 1.0 ┆ 27.0 ┆ 715.29 ┆ 1 │
└────────────┴───────────────┴──────────┴────────┴───┴──────────┴───────┴─────────┴──────────┘
Aggregate to person-Level
Revisit the example person
shape: (1, 4)
┌────────────┬─────────────┬───────────┬────────┐
│ DUPERSID ┆ pers_ob_exp ┆ ob_visits ┆ any_OB │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ i32 ┆ i32 │
╞════════════╪═════════════╪═══════════╪════════╡
│ 2790405102 ┆ 899.88 ┆ 2 ┆ 1 │
└────────────┴─────────────┴───────────┴────────┘
Left-join person-level cancer statistics onto FYC
shape: (18_919, 11)
┌─────────┬──────────┬────────────┬────────┬───┬─────────┬─────────────┬───────────┬────────┐
│ AGELAST ┆ TOTEXP23 ┆ DUPERSID ┆ VARSTR ┆ … ┆ age_cat ┆ pers_ob_exp ┆ ob_visits ┆ any_OB │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str ┆ f64 ┆ ┆ str ┆ f64 ┆ i32 ┆ i32 │
╞═════════╪══════════╪════════════╪════════╪═══╪═════════╪═════════════╪═══════════╪════════╡
│ 58.0 ┆ 646.0 ┆ 2790002101 ┆ 2019.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 27.0 ┆ 1894.0 ┆ 2790002102 ┆ 2019.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 49.0 ┆ 986.0 ┆ 2790004101 ┆ 2084.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 75.0 ┆ 1312.0 ┆ 2790006101 ┆ 2113.0 ┆ … ┆ 65+ ┆ 283.28 ┆ 3 ┆ 1 │
│ 23.0 ┆ 0.0 ┆ 2790006102 ┆ 2113.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 11.0 ┆ 4585.0 ┆ 2819784105 ┆ 2015.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 37.0 ┆ 6820.0 ┆ 2819788101 ┆ 2005.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 24.0 ┆ 2549.0 ┆ 2819792101 ┆ 2012.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 22.0 ┆ 1680.0 ┆ 2819793101 ┆ 2004.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
│ 50.0 ┆ 7802.0 ┆ 2819793102 ┆ 2004.0 ┆ … ┆ <65 ┆ 0.0 ┆ 0 ┆ 0 │
└─────────┴──────────┴────────────┴────────┴───┴─────────┴─────────────┴───────────┴────────┘