2025 Medical Expenditure Panel Survey, Household Component (MEPS-HC)

Reproducing the 2025 MEPS Workshop in Python (svy)

true
Author
Modified

January 10, 2026

In this document, we use Python and the svy library to reproduce the 2025 MEPS Workshop (originally conducted in R, see GitHub Repository).

To follow along and run the code locally, download the following 2023 MEPS public-use files:

TipCreate a Python environment using uv

The steps below give you a fast, reproducible setup for running the MEPS workshop with svy.

  • If you do not have uv, install it. See instructions at https://docs.astral.sh/uv/getting-started/installation/
  • restart your shell so uv is on PATH, and from your root project run: uv venv -p 3.13
  • Initialize the environment with uv init
  • Add the requirements: uv add svy[report]

Since it’s a simple analysis, you can store the datasets and code in the root project folder.

Imports used throughout the workshop and some general settings.

import polars as pl
import svy

svy.Estimate.PRINT_WIDTH = 89
svy.Estimate.DECIMALS = 4

Part I — Estimates for National Health Care for the U.S. Civilian Non-Institutionalized Population, 2023

Exploration of the Relevant Data from the 2023 MEPS-HC

First, we read the 2023 Full-Year Consolidated file from local storage using svy.read_stata. Then we subset the variables needed for this tutorial, derive two helper variables, and run quick QC checks.

Read the Stata file via svy

fyc23 = svy.read_stata(path="./h251.dta")

print(fyc23)
shape: (18_919, 1_374)
┌────────────┬───────┬────────────┬───────┬───┬──────────────┬──────────────┬────────┬────────┐
│ DUID       ┆ PID   ┆ DUPERSID   ┆ PANEL ┆ … ┆ FAMWT23C     ┆ SAQWT23F     ┆ VARSTR ┆ VARPSU │
│ ---        ┆ ---   ┆ ---        ┆ ---   ┆   ┆ ---          ┆ ---          ┆ ---    ┆ ---    │
│ f64        ┆ f64   ┆ str        ┆ f64   ┆   ┆ f64          ┆ f64          ┆ f64    ┆ f64    │
╞════════════╪═══════╪════════════╪═══════╪═══╪══════════════╪══════════════╪════════╪════════╡
│ 2.790002e6 ┆ 101.0 ┆ 2790002101 ┆ 27.0  ┆ … ┆ 11158.817826 ┆ 13221.315673 ┆ 2019.0 ┆ 1.0    │
│ 2.790002e6 ┆ 102.0 ┆ 2790002102 ┆ 27.0  ┆ … ┆ 11158.817826 ┆ 0.0          ┆ 2019.0 ┆ 1.0    │
│ 2.790004e6 ┆ 101.0 ┆ 2790004101 ┆ 27.0  ┆ … ┆ 28540.745942 ┆ 29999.277476 ┆ 2084.0 ┆ 1.0    │
│ 2.790006e6 ┆ 101.0 ┆ 2790006101 ┆ 27.0  ┆ … ┆ 10821.040689 ┆ 11144.513916 ┆ 2113.0 ┆ 1.0    │
│ 2.790006e6 ┆ 102.0 ┆ 2790006102 ┆ 27.0  ┆ … ┆ 10821.040689 ┆ 0.0          ┆ 2113.0 ┆ 1.0    │
│ …          ┆ …     ┆ …          ┆ …     ┆ … ┆ …            ┆ …            ┆ …      ┆ …      │
│ 2.819784e6 ┆ 105.0 ┆ 2819784105 ┆ 28.0  ┆ … ┆ 5126.030033  ┆ 0.0          ┆ 2015.0 ┆ 1.0    │
│ 2.819788e6 ┆ 101.0 ┆ 2819788101 ┆ 28.0  ┆ … ┆ 3608.933864  ┆ 4902.751776  ┆ 2005.0 ┆ 1.0    │
│ 2.819792e6 ┆ 101.0 ┆ 2819792101 ┆ 28.0  ┆ … ┆ 26238.599825 ┆ 0.0          ┆ 2012.0 ┆ 3.0    │
│ 2.819793e6 ┆ 101.0 ┆ 2819793101 ┆ 28.0  ┆ … ┆ 15780.213332 ┆ 0.0          ┆ 2004.0 ┆ 1.0    │
│ 2.819793e6 ┆ 102.0 ┆ 2819793102 ┆ 28.0  ┆ … ┆ 15780.213332 ┆ 0.0          ┆ 2004.0 ┆ 1.0    │
└────────────┴───────┴────────────┴───────┴───┴──────────────┴──────────────┴────────┴────────┘

Subset to columns used in the tutorial using polars.

fyc23_sub = fyc23.select(
    ["AGELAST", "TOTEXP23", "DUPERSID", "VARSTR", "VARPSU", "PERWT23F"]
)

print(fyc23_sub.head())
shape: (5, 6)
┌─────────┬──────────┬────────────┬────────┬────────┬──────────────┐
│ AGELAST ┆ TOTEXP23 ┆ DUPERSID   ┆ VARSTR ┆ VARPSU ┆ PERWT23F     │
│ ---     ┆ ---      ┆ ---        ┆ ---    ┆ ---    ┆ ---          │
│ f64     ┆ f64      ┆ str        ┆ f64    ┆ f64    ┆ f64          │
╞═════════╪══════════╪════════════╪════════╪════════╪══════════════╡
│ 58.0    ┆ 646.0    ┆ 2790002101 ┆ 2019.0 ┆ 1.0    ┆ 11664.426815 │
│ 27.0    ┆ 1894.0   ┆ 2790002102 ┆ 2019.0 ┆ 1.0    ┆ 32212.113596 │
│ 49.0    ┆ 986.0    ┆ 2790004101 ┆ 2084.0 ┆ 1.0    ┆ 21944.142826 │
│ 75.0    ┆ 1312.0   ┆ 2790006101 ┆ 2113.0 ┆ 1.0    ┆ 10328.00953  │
│ 23.0    ┆ 0.0      ┆ 2790006102 ┆ 2113.0 ┆ 1.0    ┆ 17430.521357 │
└─────────┴──────────┴────────────┴────────┴────────┴──────────────┘

Derive helper variables:

  • has_exp: indicator for any total expenditure
  • age_cat: <65 vs 65+
fyc23x = fyc23_sub.with_columns(
    has_exp=pl.col("TOTEXP23").gt(pl.lit(0)),
    age_cat=pl.when(pl.col("AGELAST") < 65)
    .then(pl.lit("<65"))
    .otherwise(pl.lit("65+")),
)

print(fyc23x.head())
shape: (5, 8)
┌─────────┬──────────┬────────────┬────────┬────────┬──────────────┬─────────┬─────────┐
│ AGELAST ┆ TOTEXP23 ┆ DUPERSID   ┆ VARSTR ┆ VARPSU ┆ PERWT23F     ┆ has_exp ┆ age_cat │
│ ---     ┆ ---      ┆ ---        ┆ ---    ┆ ---    ┆ ---          ┆ ---     ┆ ---     │
│ f64     ┆ f64      ┆ str        ┆ f64    ┆ f64    ┆ f64          ┆ bool    ┆ str     │
╞═════════╪══════════╪════════════╪════════╪════════╪══════════════╪═════════╪═════════╡
│ 58.0    ┆ 646.0    ┆ 2790002101 ┆ 2019.0 ┆ 1.0    ┆ 11664.426815 ┆ true    ┆ <65     │
│ 27.0    ┆ 1894.0   ┆ 2790002102 ┆ 2019.0 ┆ 1.0    ┆ 32212.113596 ┆ true    ┆ <65     │
│ 49.0    ┆ 986.0    ┆ 2790004101 ┆ 2084.0 ┆ 1.0    ┆ 21944.142826 ┆ true    ┆ <65     │
│ 75.0    ┆ 1312.0   ┆ 2790006101 ┆ 2113.0 ┆ 1.0    ┆ 10328.00953  ┆ true    ┆ 65+     │
│ 23.0    ┆ 0.0      ┆ 2790006102 ┆ 2113.0 ┆ 1.0    ┆ 17430.521357 ┆ false   ┆ <65     │
└─────────┴──────────┴────────────┴────────┴────────┴──────────────┴─────────┴─────────┘

QC 1: Two-way counts of derived variables

qc_exp_by_age = fyc23x.group_by(["has_exp", "age_cat"]).agg(pl.len())

print(qc_exp_by_age)
shape: (4, 3)
┌─────────┬─────────┬───────┐
│ has_exp ┆ age_cat ┆ len   │
│ ---     ┆ ---     ┆ ---   │
│ bool    ┆ str     ┆ u32   │
╞═════════╪═════════╪═══════╡
│ false   ┆ <65     ┆ 2497  │
│ false   ┆ 65+     ┆ 158   │
│ true    ┆ <65     ┆ 11778 │
│ true    ┆ 65+     ┆ 4486  │
└─────────┴─────────┴───────┘

QC 2: Expenditure ranges by has_exp

qc_exp = fyc23x.group_by(["has_exp"]).agg(
    pl.col("TOTEXP23").min().alias("min"),
    pl.col("TOTEXP23").max().alias("max"),
)

print(qc_exp)
shape: (2, 3)
┌─────────┬─────┬──────────┐
│ has_exp ┆ min ┆ max      │
│ ---     ┆ --- ┆ ---      │
│ bool    ┆ f64 ┆ f64      │
╞═════════╪═════╪══════════╡
│ false   ┆ 0.0 ┆ 0.0      │
│ true    ┆ 1.0 ┆ 574675.0 │
└─────────┴─────┴──────────┘

QC 3: Age ranges by age_cat

qc_age = fyc23x.group_by(["age_cat"]).agg(
    pl.col("AGELAST").min().alias("min"),
    pl.col("AGELAST").max().alias("max"),
)

print(qc_age)
shape: (2, 3)
┌─────────┬──────┬──────┐
│ age_cat ┆ min  ┆ max  │
│ ---     ┆ ---  ┆ ---  │
│ str     ┆ f64  ┆ f64  │
╞═════════╪══════╪══════╡
│ <65     ┆ 0.0  ┆ 64.0 │
│ 65+     ┆ 65.0 ┆ 85.0 │
└─────────┴──────┴──────┘

Estimation of Expenses

Sample design

First, we are going to define the sample

from svy import Sample, Design

fyc23_design = Design(stratum="VARSTR", psu="VARPSU", wgt="PERWT23F")

fyc23_sample = Sample(data=fyc23x, design=fyc23_design)

print(fyc23_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 18919                                       
   Number of columns: 11                                       
   Number of strata: 105                                       
   Number of PSUs: 262                                         
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          
    Row index           svy_row_index                          
    Stratum             VARSTR                                 
    PSU                 VARPSU                                 
    SSU                 None                                   
    Weight              PERWT23F                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   None                                   
                                                               
╰───────────────────────────────────────────────────────────────╯

Checking for singletons

If some strata only have one PSU, the estimation will fail. We can check for singletons (strata with one PSU) as follows

# List of strata with only one PSU
fyc23_sample.singleton.detected()
[]

The sample does not have singletons.

Overall expenses (national totals)

tot_exp = fyc23_sample.estimation.total(y="TOTEXP23")

print(tot_exp)
╭────────────────────────────── Estimate: TOTAL (TAYLOR) ───────────────────────────────╮
                                                                                       
               est                 se               lci                uci     cv (%)  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
  2,504,715,663,9…   4,769,269,754,8…   -6,915,494,271…   11,924,925,599,…   190.4116  
                                                                                       
╰───────────────────────────────────────────────────────────────────────────────────────╯

Percentage of persons with an expense

has_exp = fyc23_sample.estimation.prop(y="has_exp")

# has_exp.set_decimals(6)
print(has_exp)
╭──────────────── Estimate: PROP (TAYLOR) ────────────────╮
                                                         
  has_exp      est       se      lci      uci    cv (%)  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
  false     0.1445   0.0764   0.0475   0.3641   52.8735  
  true      0.8555   0.0764   0.6359   0.9525    8.9301  
                                                         
╰─────────────────────────────────────────────────────────╯

Mean expense per person

avg_exp = fyc23_sample.estimation.mean(y="TOTEXP23")

print(avg_exp)
╭─────────────────── Estimate: MEAN (TAYLOR) ────────────────────╮
                                                                
         est           se          lci           uci    cv (%)  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
  7,487.2616   1,620.5194   4,286.4292   10,688.0940   21.6437  
                                                                
╰────────────────────────────────────────────────────────────────╯

Mean expense per person, for people with expenditures)

avg_exp1 = fyc23_sample.estimation.mean(y="TOTEXP23")

print(avg_exp1)
╭─────────────────── Estimate: MEAN (TAYLOR) ────────────────────╮
                                                                
         est           se          lci           uci    cv (%)  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
  7,487.2616   1,620.5194   4,286.4292   10,688.0940   21.6437  
                                                                
╰────────────────────────────────────────────────────────────────╯

Subset to people with income

has_exp_sample = fyc23_sample.wrangling.filter_records(svy.col("has_exp"))

print(has_exp_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 16264                                       
   Number of columns: 11                                       
   Number of strata: 105                                       
   Number of PSUs: 262                                         
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          
    Row index           svy_row_index                          
    Stratum             VARSTR                                 
    PSU                 VARPSU                                 
    SSU                 None                                   
    Weight              PERWT23F                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   None                                   
                                                               
╰───────────────────────────────────────────────────────────────╯

Mean expense per person with an expense

avg_exp1 = has_exp_sample.estimation.mean(y="TOTEXP23")

print(avg_exp1)
╭─────────────────── Estimate: MEAN (TAYLOR) ────────────────────╮
                                                                
         est           se          lci           uci    cv (%)  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
  8,751.8209   1,130.1951   6,519.4719   10,984.1700   12.9138  
                                                                
╰────────────────────────────────────────────────────────────────╯

Mean expense per person with an expense, by age category

avg_exp1_by_age_cat = has_exp_sample.estimation.mean(
    y="TOTEXP23", by="age_cat"
)

print(avg_exp1_by_age_cat)
╭───────────────────────── Estimate: MEAN (TAYLOR) ─────────────────────────╮
                                                                           
  age_cat           est           se          lci           uci    cv (%)  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
  65+       16,000.1324   4,706.0289   6,704.8351   25,295.4298   29.4124  
  <65        6,853.3919     818.1920   5,237.3079    8,469.4759   11.9385  
                                                                           
╰───────────────────────────────────────────────────────────────────────────╯

Median expense per person with an expense, by age category

from svy import QuantileMethod

median_exp1_by_age_cat = has_exp_sample.estimation.median(
    y="TOTEXP23", by="age_cat", q_method=QuantileMethod.HIGHER
)

print(median_exp1_by_age_cat)
Back to top