Validation of Design-Based Survey Estimators in Python: A Comparison of svy and R survey

true
Validation
Survey Methods
Python
R
Side-by-side validation showing Python’s svy produces identical results to R’s survey package for complex survey analysis including Taylor linearization, BRR, jackknife, and bootstrap methods.
Author

Mamadou S. Diallo, Ph.D.

Published

January 10, 2026

Modified

January 17, 2026

Keywords

Python survey analysis, svy package, R survey comparison, complex survey statistics, Taylor linearization, BRR variance estimation, jackknife variance estimation, bootstrap survey, design-based inference, survey sampling Python

NoteWork in Progress

This validation note is under active development.

All results shown are final and reproducible; remaining sections will be added incrementally.

Summary

TipTL;DR

svy produces numerically identical results to R’s survey package when equivalent survey designs and variance estimators are specified.

Both libraries implement the same design-based inferential framework, including:

  • Complex designs (stratification, clustering, unequal weights)
  • Taylor linearization and replication-based variance estimation
  • Logit-transformed confidence intervals for proportions
  • Categorical data analysis and regression
Estimator Design / Method Match1
Mean Stratified
Mean One-stage cluster
Mean Two-stage cluster (ultimate cluster)
Mean Stratified + clustered
Proportion Logit-transformed CIs
Total Stratified + clustered
Ratio Stratified + clustered
Domain estimation Mean, ratio
BRR Ratio
Jackknife Ratio
Bootstrap Mean

These results validate svy as a statistically equivalent alternative to R’s survey package for complex survey analysis by (Lumley (2010)).

Introduction

For decades, survey statisticians have relied on specialized software for design-based inference—tools like SAS, SPSS, Stata, and R’s survey package. These have long been the trusted workhorses for analyzing complex survey data with proper variance estimation. Now Python joins that group. With the svy library, Python users can perform rigorous, design-based analysis while staying within the modern data science ecosystem.

But can svy be trusted?

This note answers that question by comparing results from Python’s svy and R’s survey package. Using the same dataset and survey design specifications, we show that svy produces identical estimates and standard errors—validating its statistical rigor and establishing it as a reliable choice for complex survey analysis.

Validation Scope

This comparison focuses on design-based estimation, including:

  • Means, totals, proportions, and ratios
  • Domain (subpopulation) estimation
  • Taylor linearization variance estimation
  • Variance estimation for multi-stage designs
  • Replication-based variance estimators (BRR, jackknife, bootstrap)
  • Categorical data analysis (cross-tabulation, t-tests)
  • Regression analysis (linear, logistic, Poisson)

Loading the Datasets

Setting up R environment

library(survey)
library(srvyr)
library(gt)
library(dplyr)
library(readr)

data(api)

packageVersion("survey")
[1] '4.4.8'
nhanes2brr = readr::read_csv("data/nhanes2brr.csv")
nhanes2fay = readr::read_csv("data/nhanes2fay.csv")
nhanes2jknife = readr::read_csv("data/nhanes2jknife.csv")
nmihs_bs = readr::read_csv("data/nmihs_bs.csv")
acs_hak = readr::read_csv("data/psam_h02.csv")
wb_synth_smp = readr::read_csv("data/WLD_2023_SYNTH-SVY-HLD-EN_v01_M.csv")

Setting up Python environment

import svy
import polars as pl

from great_tables import GT

# Set global display precision to 6 decimals
pl.Config.set_float_precision(6)
<class 'polars.config.Config'>

apistrat = svy.io.read_csv("data/apistrat.csv")
apiclus1 = svy.io.read_csv("data/apiclus1.csv")
apiclus2 = svy.io.read_csv("data/apiclus2.csv")

nhanes2brr = svy.io.read_csv("data/nhanes2brr.csv")
nhanes2fay = svy.io.read_csv("data/nhanes2fay.csv")
nhanes2jknife = svy.io.read_csv("data/nhanes2jknife.csv")
nmihs_bs = svy.io.read_csv("data/nmihs_bs.csv")
acs_hak = svy.io.read_csv("data/psam_h02.csv")
wb_synth_smp = svy.io.read_csv("data/WLD_2023_SYNTH-SVY-HLD-EN_v01_M.csv")

Taylor-Based Estimation

Estimating a Mean

Stratified sample

svy Results

design_str = svy.Design(stratum="stype", wgt="pw")
sample_str = svy.Sample(data=apistrat, design=design_str)

api00_mean_str = sample_str.estimation.mean("api00")

cols = ["est", "se", "lci", "uci"]
(
    GT(api00_mean_str.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
662.287363 9.536132 643.481357 681.093370

R Results

design_str <- apistrat |>
  srvyr::as_survey_design(strata = stype, weights = pw)

design_str |>
  summarize(
    est = srvyr::survey_mean(api00, vartype = c("se", "ci"))
  )|>
  gt() |>
  fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
est est_se est_low est_upp
662.287363 9.536132 643.481357 681.093370

One-stage sample

svy Results

design_clus1 = svy.Design(psu="dnum", wgt="pw")
sample_clus1 = svy.Sample(data=apiclus1, design=design_clus1)

api00_mean_clus1 = sample_clus1.estimation.mean("api00")

cols = ["est", "se", "lci", "uci"]
(
    GT(api00_mean_clus1.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
644.169399 23.779011 593.168493 695.170305

R Results

design_clus1 <- apiclus1 |>
  srvyr::as_survey_design(id = dnum, weights = pw)

design_clus1 |>
  dplyr::summarize(
    est = srvyr::survey_mean(api00, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
est est_se est_low est_upp
644.169399 23.779011 593.168493 695.170305

Two-stage sample

svy Results

# Two-stage design: districts (dnum) → schools (snum)
# Note: ssu is specified for documentation, but variance uses ultimate cluster
design_clus2 = svy.Design(psu="dnum", wgt="pw")
sample_clus2 = svy.Sample(data=apiclus2, design=design_clus2)

api00_mean_clus2 = sample_clus2.estimation.mean("api00")

cols = ["est", "se", "lci", "uci"]
(
    GT(api00_mean_clus2.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
670.811808 30.711576 608.691782 732.931835

R Results

design_clus2 <- apiclus2 |>
  srvyr::as_survey_design(id = c(dnum, snum), weights = pw)

design_clus2 |>
  dplyr::summarize(
    est = srvyr::survey_mean(api00, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
est est_se est_low est_upp
670.811808 30.711576 608.691782 732.931835
TipTwo-stage variance estimation

For multi-stage designs, svy uses the ultimate cluster variance estimator, which approximates total variance using first-stage (PSU) variability only. This approach is standard in survey software (including R’s survey package) because it:

  1. Produces conservative variance estimates
  2. Avoids requiring population sizes at lower stages
  3. Reflects the dominant source of variability in most designs

Accordingly, specifying Design(psu="dnum", ssu="snum") yields the same variance estimates as Design(psu="dnum").

Stratified clustered sample

For this example, we will use The World Bank Synthetic Survey data (World Bank 2023).

svy Results

design_str_clus = svy.Design(stratum=("geo1", "urbrur"), psu="ea", wgt="hhweight")
sample_str_clus = svy.Sample(data=wb_synth_smp, design=design_str_clus)

tot_exp = sample_str_clus.estimation.mean("tot_exp")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
12,048.963780 229.986492 11,596.378760 12,501.548800

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::summarize(
    est = srvyr::survey_mean(tot_exp, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
est est_se est_low est_upp
12,048.963780 229.986492 11,596.378760 12,501.548800

Other Population Parameters

Proportion

svy Results

electricity = sample_str_clus.estimation.prop("electricity")

cols = ["est", "se", "lci", "uci"]
(
    GT(electricity.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
0.170550 0.011873 0.148438 0.195201
0.829450 0.011873 0.804799 0.851562

R Results

design_str_clus |>
  dplyr::group_by(electricity) |>
  dplyr::summarize(
    est = srvyr::survey_prop(vartype = c("se", "ci"), proportion = TRUE)
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
electricity est est_se est_low est_upp
No 0.170550 0.011873 0.148438 0.195201
Yes 0.829450 0.011873 0.804799 0.851562

total

sample_str_clus = sample_str_clus.wrangling.recode(
    cols="electricity",
    recodes={1: ["No"], 0: ["Yes"]},
    into = "no_electricity"
)
electricity = sample_str_clus.estimation.total("no_electricity")

cols = ["est", "se", "lci", "uci"]
(
    GT(electricity.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
426,675.251960 30,622.991644 366,412.985386 486,937.518534

R Results

design_str_clus |>
  dplyr::mutate(no_electricity = electricity != "Yes") |>
  dplyr::summarize(
    est = srvyr::survey_total(no_electricity, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
est est_se est_low est_upp
426,675.251960 30,622.991644 366,412.985386 486,937.518534

Ratio

svy Results

tot_exp = sample_str_clus.estimation.ratio(y="tot_exp", x="hhsize")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
2,992.110041 71.224260 2,851.949491 3,132.270590

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::summarize(
    est = srvyr::survey_ratio(tot_exp, hhsize, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
est est_se est_low est_upp
2,992.110041 71.224260 2,851.949491 3,132.270590

Domain estimation

Average expenditure by urban and rural areas

svy Results

tot_exp = sample_str_clus.estimation.mean(y="tot_exp", by="urbrur")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(["by_level"]+cols))
    .fmt_number(columns=cols, decimals=6)
)
by_level est se lci uci
['Urban'] 14,437.918429 326.402120 13,795.599356 15,080.237501
['Rural'] 9,116.629337 305.519957 8,515.403784 9,717.854889

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::group_by(urbrur) |>
  dplyr::summarize(
    est = srvyr::survey_mean(tot_exp, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
urbrur est est_se est_low est_upp
Rural 9,116.629337 305.519957 8,515.403784 9,717.854889
Urban 14,437.918429 326.402120 13,795.599356 15,080.237501

Ratio of expenditure over household size by banking status

svy Results

tot_exp = sample_str_clus.estimation.ratio(y="tot_exp", x="hhsize", by="bank")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(["by_level"]+cols))
    .fmt_number(columns=cols, decimals=6)
)
by_level est se lci uci
['No'] 1,784.529865 41.958370 1,701.960973 1,867.098757
['Yes'] 3,960.323902 89.247824 3,784.695203 4,135.952601

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::group_by(bank) |>
  dplyr::summarize(
    est = srvyr::survey_ratio(tot_exp, hhsize, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )
bank est est_se est_low est_upp
No 1,784.529865 41.958370 1,701.960973 1,867.098757
Yes 3,960.323902 89.247824 3,784.695203 4,135.952601

Replication-Based estimation

TipDegrees of freedom for replicate weight designs

When only replicate weights are provided (without strata/PSU identifiers), the true design df is unknown:

  • svy defaults to df = n_reps - 1
  • R defaults to the rank of the replicate weight matrix minus 1

Both approaches are heuristics. The rank-based method can detect when post-stratification or calibration has reduced the effective df, but is computationally expensive and numerically sensitive.

Both packages allow user override: RepWeights(df=...) in svy, degf= in R’s svrepdesign().

In practice, data providers typically document the correct degrees of freedom for their replicate weights (e.g., NHANES, ACS). Always consult the survey documentation and specify df explicitly when known.

Balanced Repeated Replication (BRR)

svy Results

rep_weights = svy.RepWeights(method=svy.EstMethod.BRR, prefix="brr_", n_reps=32)
design_brr = svy.Design(wgt="finalwgt", rep_wgts=rep_weights)
sample_brr = svy.Sample(data=nhanes2brr, design=design_brr)

ratio_wgt_hgt = sample_brr.estimation.ratio(y="weight", x="height")

cols = ["est", "se", "lci", "uci"]
(
    GT(ratio_wgt_hgt.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
0.426812 0.000890 0.424996 0.428628

R Results

design_brr <- svrepdesign(
  data = nhanes2brr,
  weights = ~finalwgt,
  repweights = "brr_",
  type = "BRR",
  combined.weights = TRUE
)
ratio_wgt_hgt <- svyratio(~weight, ~height, design = design_brr)

# Extract results into a data frame
est <- coef(ratio_wgt_hgt)
se <- SE(ratio_wgt_hgt)
ci <- confint(ratio_wgt_hgt, df = degf(design_brr))

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)
est se lci uci
0.426812 0.000890 0.424996 0.428628
NoteConfidence intervals for replicate designs

R’s confint() uses normal (z) approximation by default for replicate weight designs.

To match svy’s t-based CIs, use confint(..., df = degf(design)).

Jackknife

svy Results

rep_weights = svy.RepWeights(
    method=svy.EstMethod.JACKKNIFE, prefix="jkw_", n_reps=62, df=61
)
design_jkn = svy.Design(wgt="finalwgt", rep_wgts=rep_weights)
sample_jkn = svy.Sample(data=nhanes2jknife, design=design_jkn)

ratio_wgt_hgt = sample_jkn.estimation.ratio(y="weight", x="height")

cols = ["est", "se", "lci", "uci"]
(
    GT(ratio_wgt_hgt.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
0.426812 0.001247 0.424319 0.429304

R Results

design_jkn <- svrepdesign(
  data = nhanes2jknife,
  weights = ~finalwgt,
  repweights = "jkw_",
  type = "JKn",
  combined.weights = TRUE,
  rscales = rep((62 - 1) / 62, 62)
)
ratio_wgt_hgt <- svyratio(~weight, ~height, design = design_jkn)

# Extract results into a data frame
est <- coef(ratio_wgt_hgt)
se <- SE(ratio_wgt_hgt)
ci <- confint(ratio_wgt_hgt, df = 61)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)
est se lci uci
0.426812 0.001247 0.424319 0.429304

Bootstrap

svy Results

rep_weights = svy.RepWeights(method=svy.EstMethod.BOOTSTRAP, prefix="bsrw", n_reps=1000)
design_bs = svy.Design(wgt="finwgt", rep_wgts=rep_weights)
sample_bs = svy.Sample(data=nmihs_bs, design=design_bs)

mean_birth_weight = sample_bs.estimation.mean(y="birthwgt", drop_nulls=True)

cols = ["est", "se", "lci", "uci"]
(
    GT(mean_birth_weight.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
3,355.452419 6.520638 3,342.656702 3,368.248137

R Results

design_bs <- svrepdesign(
  data = nmihs_bs,
  weights = ~finwgt,
  repweights = "bsrw",
  type = "bootstrap",
  replicates=1000,
  combined.weights = TRUE,
  rscales = rep((1000 - 1) / 1000, 1000)
)

mean_birth_weight <- svymean(~birthwgt, design = design_bs, na.rm = TRUE)

est <- coef(mean_birth_weight)
se <- SE(mean_birth_weight)
ci <- confint(mean_birth_weight, df = 999)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)
est se lci uci
3,355.452419 6.520638 3,342.656702 3,368.248137

Successive Difference Replication (SDR)

The American Community Survey (ACS) provides 80 replicate weights constructed using successive difference replication (SDR). To illustrate SDR, we will use data from the 2024 American Community Survey (ACS) 1-Year Public Use Microdata Sample2.

ACS replicate weights use SDR with 80 replicates (e.g., WGTP1WGTP80) alongside the main weight WGTP. The ACS documentation describes the SDR replicate-weight construction and recommended variance estimation practice.

svy Results

rep_weights_acs = svy.RepWeights(method=svy.EstMethod.SDR, prefix="WGTP", n_reps=80)
design_acs = svy.Design(wgt="WGTP", rep_wgts=rep_weights_acs)
sample_acs = svy.Sample(data=acs_hak, design=design_acs)


mean_hincp = sample_acs.estimation.mean(y="HINCP", drop_nulls=True)

cols = ["est", "se", "lci", "uci"]
(
    GT(mean_hincp.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)
est se lci uci
111,770.504058 2,517.264331 106,760.014741 116,780.993375

R Results

R’s survey supports SDR directly via type="successive-difference". It also includes a dedicated type="ACS" shortcut that applies ACS-specific defaults. In practice, both should agree when equivalent settings are used.

design_sdr <- svrepdesign(
  data = acs_hak,
  weights = ~WGTP,
  repweights = "^WGTP[0-9]+",
  type = "successive-difference",
  scale = 4 / 80,
  combined.weights = TRUE,
  rscales = 1,
)

mean_hincp_sdr   <- svymean(~HINCP, design = design_sdr, na.rm = TRUE)

est <- coef(mean_hincp_sdr)
se <- SE(mean_hincp_sdr)
ci <- confint(mean_hincp_sdr, df = 79)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)
est se lci uci
111,770.504058 2,517.264331 106,760.014741 116,780.993375

equivalently

design_acs <- svrepdesign(
  data = acs_hak,
  weights = ~WGTP,
  repweights = "^WGTP[0-9]+",
  type = "ACS",
  combined.weights = TRUE,
)

mean_hincp_acs <- svymean(~HINCP, design = design_acs, na.rm = TRUE)

est <- coef(mean_hincp_acs)
se <- SE(mean_hincp_acs)
ci <- confint(mean_hincp_acs, df = 79)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)
est se lci uci
111,770.504058 2,517.264331 106,760.014741 116,780.993375
TipReplicate Variance Calculation

By default, both svy and R survey use the average replicate estimates for calculating the estimated variance.

If, instead you want to use the full sample estimate:

  • Use rep_center = "estimate" with svy
  • Use mse = TRUE with R survey

Categorical Data Analysis

Let’s use the World Bank dataset to demonstrate categorical data analysis.

Cross-tabulation

Below, we compute the cross-tabulation of urban/rural and electricity access and show the Rao-Scott χ² test.

svy Results

crosstab = sample_str_clus.categorical.tabulate(
    rowvar="urbrur",
    colvar="electricity",
    units=svy.TableUnits.PERCENT,
)

cols = ["est", "se", "lci", "uci"]
(
    GT(crosstab.to_polars().select(["rowvar", "colvar"] + cols))
    .fmt_number(columns=cols, decimals=6)
)
rowvar colvar est se lci uci
Rural No 15.198691 1.165349 12.905428 17.491953
Rural Yes 29.695594 1.309476 27.118707 32.272480
Urban No 1.856347 0.374548 1.119282 2.593412
Urban Yes 53.249369 0.831273 51.613526 54.885211
test_stat = crosstab.stats.pearson_adj

# Create a formatted dataframe
test_df = pl.DataFrame({
    "statistic": ["Pearson χ² (adjusted)"],
    "F_value": [test_stat.value],
    "df_num": [test_stat.df_num],
    "df_den": [test_stat.df_den],
    "p_value": [test_stat.p_value]
})

cols = ["F_value", "df_num", "df_den", "p_value"]
GT(test_df).fmt_number(columns=cols, decimals=6)
statistic F_value df_num df_den p_value
Pearson χ² (adjusted) 193.172687 1.000000 301.000000 0.000000

R Results

crosstab <- survey::svytable(~urbrur + electricity, design_str_clus, Ntotal = 100)

test_result <- summary(crosstab)

test_result$table
       electricity
urbrur         No       Yes
  Rural 15.198691 29.695594
  Urban  1.856347 53.249369
test_result$statistic

    Pearson's X^2: Rao & Scott adjustment

data:  NextMethod()
F = 193.17, ndf = 1, ddf = 301, p-value < 2.2e-16

T-tests

One group

svy Results

tot_exp_test1 = sample_str_clus.categorical.ttest(
    y="tot_exp",
    mean_h0=12500,
)

print(tot_exp_test1.to_polars().drop("y"))
shape: (1, 7)
┌─────────────┬────────────┬─────────────┬──────────┬───────────┬────────────┬──────────┐
│ diff        ┆ se         ┆ lci         ┆ uci      ┆ t         ┆ df         ┆ p_value  │
│ ---         ┆ ---        ┆ ---         ┆ ---      ┆ ---       ┆ ---        ┆ ---      │
│ f64         ┆ f64        ┆ f64         ┆ f64      ┆ f64       ┆ f64        ┆ f64      │
╞═════════════╪════════════╪═════════════╪══════════╪═══════════╪════════════╪══════════╡
│ -451.036220 ┆ 229.986492 ┆ -903.621240 ┆ 1.548800 ┆ -1.961142 ┆ 300.000000 ┆ 0.050787 │
└─────────────┴────────────┴─────────────┴──────────┴───────────┴────────────┴──────────┘
tot_exp_test1 <- svyttest((tot_exp - 12500) ~ 0, design_str_clus)

test1_df <- data.frame(
  test = "One-sample t-test",
  statistic = tot_exp_test1$statistic,
  df = tot_exp_test1$parameter,
  p_value = tot_exp_test1$p.value,
  mean_diff = tot_exp_test1$estimate,
  ci_lower = tot_exp_test1$conf.int[1],
  ci_upper = tot_exp_test1$conf.int[2]
)

test1_df |>
  gt::gt() |>
  gt::fmt_number(
    columns = c(statistic, df, mean_diff, ci_lower, ci_upper, p_value),
    decimals = 6
  )
test statistic df p_value mean_diff ci_lower ci_upper
One-sample t-test −1.961142 300.000000 0.050787 −451.036220 −903.627330 1.554890

Two groups

svy Results

tot_exp_test2 = sample_str_clus.categorical.ttest(
    y="tot_exp",
    group="urbrur",
)

print(tot_exp_test2.to_polars().drop(["y", "group_var", "paired"]))
shape: (1, 7)
┌─────────────┬────────────┬─────────────┬─────────────┬───────────┬────────────┬──────────┐
│ diff        ┆ se         ┆ lci         ┆ uci         ┆ t         ┆ df         ┆ p_value  │
│ ---         ┆ ---        ┆ ---         ┆ ---         ┆ ---       ┆ ---        ┆ ---      │
│ f64         ┆ f64        ┆ f64         ┆ f64         ┆ f64       ┆ f64        ┆ f64      │
╞═════════════╪════════════╪═════════════╪═════════════╪═══════════╪════════════╪══════════╡
│ 5321.289092 ┆ 447.080293 ┆ 4441.478438 ┆ 6201.099746 ┆ 11.902312 ┆ 300.000000 ┆ 0.000000 │
└─────────────┴────────────┴─────────────┴─────────────┴───────────┴────────────┴──────────┘

R Results

tot_exp_test2 <- svyttest(tot_exp ~ urbrur, design_str_clus)

test2_df <- data.frame(
  test = "Two-sample t-test",
  statistic = tot_exp_test2$statistic,
  df = tot_exp_test2$parameter,
  p_value = tot_exp_test2$p.value,
  mean_diff = tot_exp_test2$estimate,
  ci_lower = tot_exp_test2$conf.int[1],
  ci_upper = tot_exp_test2$conf.int[2]
)

test2_df |>
  gt::gt() |>
  gt::fmt_number(
    columns = c(statistic, df, mean_diff, p_value, ci_lower, ci_upper),
    decimals = 6
  )
test statistic df p_value mean_diff ci_lower ci_upper
Two-sample t-test 11.902312 300.000000 0.000000 5,321.289092 4,441.478438 6,201.099746

Generalized Linear Models (GLMs)

CautionCOMING SOON!!!

Linear Regression

Logistic Regression

Poisson Regression

Comparison Summary

Category Estimator Design / Method Match Notes
Taylor Mean Stratified
Mean One-stage cluster
Mean Two-stage cluster Ultimate cluster variance
Mean Stratified + clustered
Proportion Stratified + clustered Logit-transformed CIs
Total Stratified + clustered
Ratio Stratified + clustered
Domain Mean By subgroup
Ratio By subgroup
Replication BRR 32 replicates
Jackknife 62 replicates Requires df specification
Bootstrap 1000 replicates Requires rscales in R
SDR 80 replicates

Conclusion

This validation study demonstrates that svy reproduces the results of R’s survey package for a wide range of design-based estimators when equivalent survey designs are specified.

The agreement observed across all tested cases confirms that svy implements standard survey-sampling methodology correctly, including Taylor linearization, ultimate cluster variance estimation, and replication-based variance estimators.

These results support the use of svy for production survey analysis workflows and provide a basis for further validation of advanced features.


TipCommunity signal

Help make svy the standard for survey analysis in Python

If rigorous, design-based survey inference in Python matters to you, starring the repository helps signal demand and prioritize validation and stability work.

Star svy on GitHub

Notes on methodology

  1. Two-stage variance: Both packages use ultimate cluster estimation
  2. Proportion CIs: Both use logit-transformed confidence intervals
  3. Stratification: svy accepts tuples for multiple variables; R requires interaction() or paste()

References

Back to top

References

Lumley, Thomas. 2010. Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: Wiley.
World Bank. 2023. “Synthetic Data for an Imaginary Country, Sample, 2023.” World Bank, Development Data Group. https://doi.org/10.48529/MC1F-QH23.

Footnotes

  1. All numerical comparisons were conducted using identical survey designs and variance estimators.↩︎

  2. U.S. Census Bureau. (2023). American Community Survey 1-Year Public Use Microdata Sample [Data set]. Retrieved from https://www.census.gov/programs-surveys/acs/microdata.html↩︎

Citation

BibTeX citation:
@online{s._diallo2026,
  author = {S. Diallo, Mamadou and S. Diallo, Ph.D., Mamadou},
  title = {Validation of {Design-Based} {Survey} {Estimators} in
    {Python:} {A} {Comparison} of Svy and {R} Survey},
  date = {2026-01-10},
  url = {https://svylab.com/learn/notes/posts/svy-vs-r-comparison/},
  langid = {en}
}
For attribution, please cite this work as:
S. Diallo, Mamadou, and Mamadou S. Diallo, Ph.D. 2026. “Validation of Design-Based Survey Estimators in Python: A Comparison of Svy and R Survey.” January 10, 2026. https://svylab.com/learn/notes/posts/svy-vs-r-comparison/.