Validation of Design-Based Survey Estimators in Python: A Comparison of svy and R survey

true

Validation

Survey Methods

Python

Side-by-side validation showing Python’s svy produces identical results to R’s survey package for complex survey analysis including Taylor linearization, BRR, jackknife, and bootstrap methods.

Author

Mamadou S. Diallo, Ph.D.

Published

January 10, 2026

Modified

January 17, 2026

Keywords

Python survey analysis, svy package, R survey comparison, complex survey statistics, Taylor linearization, BRR variance estimation, jackknife variance estimation, bootstrap survey, design-based inference, survey sampling Python

Work in Progress

This validation note is under active development.

All results shown are final and reproducible; remaining sections will be added incrementally.

Summary

TL;DR

svy produces numerically identical results to R’s survey package when equivalent survey designs and variance estimators are specified.

Both libraries implement the same design-based inferential framework, including:

Complex designs (stratification, clustering, unequal weights)
Taylor linearization and replication-based variance estimation
Logit-transformed confidence intervals for proportions
Categorical data analysis and regression

Estimator	Design / Method	Match¹
Mean	Stratified	✅
Mean	One-stage cluster	✅
Mean	Two-stage cluster (ultimate cluster)	✅
Mean	Stratified + clustered	✅
Proportion	Logit-transformed CIs	✅
Total	Stratified + clustered	✅
Ratio	Stratified + clustered	✅
Domain estimation	Mean, ratio	✅
BRR	Ratio	✅
Jackknife	Ratio	✅
Bootstrap	Mean	✅

These results validate svy as a statistically equivalent alternative to R’s survey package for complex survey analysis by (Lumley (2010)).

Introduction

For decades, survey statisticians have relied on specialized software for design-based inference—tools like SAS, SPSS, Stata, and R’s survey package. These have long been the trusted workhorses for analyzing complex survey data with proper variance estimation. Now Python joins that group. With the svy library, Python users can perform rigorous, design-based analysis while staying within the modern data science ecosystem.

But can svy be trusted?

This note answers that question by comparing results from Python’s svy and R’s survey package. Using the same dataset and survey design specifications, we show that svy produces identical estimates and standard errors—validating its statistical rigor and establishing it as a reliable choice for complex survey analysis.

Validation Scope

This comparison focuses on design-based estimation, including:

Means, totals, proportions, and ratios
Domain (subpopulation) estimation
Taylor linearization variance estimation
Variance estimation for multi-stage designs
Replication-based variance estimators (BRR, jackknife, bootstrap)
Categorical data analysis (cross-tabulation, t-tests)
Regression analysis (linear, logistic, Poisson)

Loading the Datasets

Setting up R environment

library(survey)
library(srvyr)
library(gt)
library(dplyr)
library(readr)

data(api)

packageVersion("survey")

[1] '4.4.8'

nhanes2brr = readr::read_csv("data/nhanes2brr.csv")
nhanes2fay = readr::read_csv("data/nhanes2fay.csv")
nhanes2jknife = readr::read_csv("data/nhanes2jknife.csv")
nmihs_bs = readr::read_csv("data/nmihs_bs.csv")
acs_hak = readr::read_csv("data/psam_h02.csv")
wb_synth_smp = readr::read_csv("data/WLD_2023_SYNTH-SVY-HLD-EN_v01_M.csv")

Setting up Python environment

import svy
import polars as pl

from great_tables import GT

# Set global display precision to 6 decimals
pl.Config.set_float_precision(6)

<class 'polars.config.Config'>


apistrat = svy.io.read_csv("data/apistrat.csv")
apiclus1 = svy.io.read_csv("data/apiclus1.csv")
apiclus2 = svy.io.read_csv("data/apiclus2.csv")

nhanes2brr = svy.io.read_csv("data/nhanes2brr.csv")
nhanes2fay = svy.io.read_csv("data/nhanes2fay.csv")
nhanes2jknife = svy.io.read_csv("data/nhanes2jknife.csv")
nmihs_bs = svy.io.read_csv("data/nmihs_bs.csv")
acs_hak = svy.io.read_csv("data/psam_h02.csv")
wb_synth_smp = svy.io.read_csv("data/WLD_2023_SYNTH-SVY-HLD-EN_v01_M.csv")

Taylor-Based Estimation

Estimating a Mean

Stratified sample

svy Results

design_str = svy.Design(stratum="stype", wgt="pw")
sample_str = svy.Sample(data=apistrat, design=design_str)

api00_mean_str = sample_str.estimation.mean("api00")

cols = ["est", "se", "lci", "uci"]
(
    GT(api00_mean_str.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
662.287363	9.536132	643.481357	681.093370

R Results

design_str <- apistrat |>
  srvyr::as_survey_design(strata = stype, weights = pw)

design_str |>
  summarize(
    est = srvyr::survey_mean(api00, vartype = c("se", "ci"))
  )|>
  gt() |>
  fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

est	est_se	est_low	est_upp
662.287363	9.536132	643.481357	681.093370

One-stage sample

svy Results

design_clus1 = svy.Design(psu="dnum", wgt="pw")
sample_clus1 = svy.Sample(data=apiclus1, design=design_clus1)

api00_mean_clus1 = sample_clus1.estimation.mean("api00")

cols = ["est", "se", "lci", "uci"]
(
    GT(api00_mean_clus1.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
644.169399	23.779011	593.168493	695.170305

R Results

design_clus1 <- apiclus1 |>
  srvyr::as_survey_design(id = dnum, weights = pw)

design_clus1 |>
  dplyr::summarize(
    est = srvyr::survey_mean(api00, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

est	est_se	est_low	est_upp
644.169399	23.779011	593.168493	695.170305

Two-stage sample

svy Results

# Two-stage design: districts (dnum) → schools (snum)
# Note: ssu is specified for documentation, but variance uses ultimate cluster
design_clus2 = svy.Design(psu="dnum", wgt="pw")
sample_clus2 = svy.Sample(data=apiclus2, design=design_clus2)

api00_mean_clus2 = sample_clus2.estimation.mean("api00")

cols = ["est", "se", "lci", "uci"]
(
    GT(api00_mean_clus2.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
670.811808	30.711576	608.691782	732.931835

R Results

design_clus2 <- apiclus2 |>
  srvyr::as_survey_design(id = c(dnum, snum), weights = pw)

design_clus2 |>
  dplyr::summarize(
    est = srvyr::survey_mean(api00, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

est	est_se	est_low	est_upp
670.811808	30.711576	608.691782	732.931835

Two-stage variance estimation

For multi-stage designs, svy uses the ultimate cluster variance estimator, which approximates total variance using first-stage (PSU) variability only. This approach is standard in survey software (including R’s survey package) because it:

Produces conservative variance estimates
Avoids requiring population sizes at lower stages
Reflects the dominant source of variability in most designs

Accordingly, specifying Design(psu="dnum", ssu="snum") yields the same variance estimates as Design(psu="dnum").

Stratified clustered sample

For this example, we will use The World Bank Synthetic Survey data (World Bank 2023).

svy Results

design_str_clus = svy.Design(stratum=("geo1", "urbrur"), psu="ea", wgt="hhweight")
sample_str_clus = svy.Sample(data=wb_synth_smp, design=design_str_clus)

tot_exp = sample_str_clus.estimation.mean("tot_exp")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
12,048.963780	229.986492	11,596.378760	12,501.548800

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::summarize(
    est = srvyr::survey_mean(tot_exp, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

est	est_se	est_low	est_upp
12,048.963780	229.986492	11,596.378760	12,501.548800

Other Population Parameters

Proportion

svy Results

electricity = sample_str_clus.estimation.prop("electricity")

cols = ["est", "se", "lci", "uci"]
(
    GT(electricity.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
0.170550	0.011873	0.148438	0.195201
0.829450	0.011873	0.804799	0.851562

R Results

design_str_clus |>
  dplyr::group_by(electricity) |>
  dplyr::summarize(
    est = srvyr::survey_prop(vartype = c("se", "ci"), proportion = TRUE)
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

electricity	est	est_se	est_low	est_upp
No	0.170550	0.011873	0.148438	0.195201
Yes	0.829450	0.011873	0.804799	0.851562

total

sample_str_clus = sample_str_clus.wrangling.recode(
    cols="electricity",
    recodes={1: ["No"], 0: ["Yes"]},
    into = "no_electricity"
)
electricity = sample_str_clus.estimation.total("no_electricity")

cols = ["est", "se", "lci", "uci"]
(
    GT(electricity.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
426,675.251960	30,622.991644	366,412.985386	486,937.518534

R Results

design_str_clus |>
  dplyr::mutate(no_electricity = electricity != "Yes") |>
  dplyr::summarize(
    est = srvyr::survey_total(no_electricity, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

est	est_se	est_low	est_upp
426,675.251960	30,622.991644	366,412.985386	486,937.518534

Ratio

svy Results

tot_exp = sample_str_clus.estimation.ratio(y="tot_exp", x="hhsize")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
2,992.110041	71.224260	2,851.949491	3,132.270590

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::summarize(
    est = srvyr::survey_ratio(tot_exp, hhsize, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

est	est_se	est_low	est_upp
2,992.110041	71.224260	2,851.949491	3,132.270590

Domain estimation

Average expenditure by urban and rural areas

svy Results

tot_exp = sample_str_clus.estimation.mean(y="tot_exp", by="urbrur")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(["by_level"]+cols))
    .fmt_number(columns=cols, decimals=6)
)

by_level	est	se	lci	uci
['Urban']	14,437.918429	326.402120	13,795.599356	15,080.237501
['Rural']	9,116.629337	305.519957	8,515.403784	9,717.854889

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::group_by(urbrur) |>
  dplyr::summarize(
    est = srvyr::survey_mean(tot_exp, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

urbrur	est	est_se	est_low	est_upp
Rural	9,116.629337	305.519957	8,515.403784	9,717.854889
Urban	14,437.918429	326.402120	13,795.599356	15,080.237501

Ratio of expenditure over household size by banking status

svy Results

tot_exp = sample_str_clus.estimation.ratio(y="tot_exp", x="hhsize", by="bank")

cols = ["est", "se", "lci", "uci"]
(
    GT(tot_exp.to_polars().select(["by_level"]+cols))
    .fmt_number(columns=cols, decimals=6)
)

by_level	est	se	lci	uci
['No']	1,784.529865	41.958370	1,701.960973	1,867.098757
['Yes']	3,960.323902	89.247824	3,784.695203	4,135.952601

R Results

design_str_clus <- wb_synth_smp |>
  dplyr::mutate(stratum = paste(geo1, urbrur, sep = "_")) |>
  srvyr::as_survey_design(id = ea, strata = stratum, weights = hhweight)

design_str_clus |>
  dplyr::group_by(bank) |>
  dplyr::summarize(
    est = srvyr::survey_ratio(tot_exp, hhsize, vartype = c("se", "ci"))
  )|>
  gt::gt() |>
  gt::fmt_number(
      columns = where(is.numeric),
      decimals = 6
    )

bank	est	est_se	est_low	est_upp
No	1,784.529865	41.958370	1,701.960973	1,867.098757
Yes	3,960.323902	89.247824	3,784.695203	4,135.952601

Replication-Based estimation

Degrees of freedom for replicate weight designs

When only replicate weights are provided (without strata/PSU identifiers), the true design df is unknown:

svy defaults to df = n_reps - 1
R defaults to the rank of the replicate weight matrix minus 1

Both approaches are heuristics. The rank-based method can detect when post-stratification or calibration has reduced the effective df, but is computationally expensive and numerically sensitive.

Both packages allow user override: RepWeights(df=...) in svy, degf= in R’s svrepdesign().

In practice, data providers typically document the correct degrees of freedom for their replicate weights (e.g., NHANES, ACS). Always consult the survey documentation and specify df explicitly when known.

Balanced Repeated Replication (BRR)

svy Results

rep_weights = svy.RepWeights(method=svy.EstMethod.BRR, prefix="brr_", n_reps=32)
design_brr = svy.Design(wgt="finalwgt", rep_wgts=rep_weights)
sample_brr = svy.Sample(data=nhanes2brr, design=design_brr)

ratio_wgt_hgt = sample_brr.estimation.ratio(y="weight", x="height")

cols = ["est", "se", "lci", "uci"]
(
    GT(ratio_wgt_hgt.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
0.426812	0.000890	0.424996	0.428628

R Results

design_brr <- svrepdesign(
  data = nhanes2brr,
  weights = ~finalwgt,
  repweights = "brr_",
  type = "BRR",
  combined.weights = TRUE
)
ratio_wgt_hgt <- svyratio(~weight, ~height, design = design_brr)

# Extract results into a data frame
est <- coef(ratio_wgt_hgt)
se <- SE(ratio_wgt_hgt)
ci <- confint(ratio_wgt_hgt, df = degf(design_brr))

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)

est	se	lci	uci
0.426812	0.000890	0.424996	0.428628

Confidence intervals for replicate designs

R’s confint() uses normal (z) approximation by default for replicate weight designs.

To match svy’s t-based CIs, use confint(..., df = degf(design)).

Jackknife

svy Results

rep_weights = svy.RepWeights(
    method=svy.EstMethod.JACKKNIFE, prefix="jkw_", n_reps=62, df=61
)
design_jkn = svy.Design(wgt="finalwgt", rep_wgts=rep_weights)
sample_jkn = svy.Sample(data=nhanes2jknife, design=design_jkn)

ratio_wgt_hgt = sample_jkn.estimation.ratio(y="weight", x="height")

cols = ["est", "se", "lci", "uci"]
(
    GT(ratio_wgt_hgt.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
0.426812	0.001247	0.424319	0.429304

R Results

design_jkn <- svrepdesign(
  data = nhanes2jknife,
  weights = ~finalwgt,
  repweights = "jkw_",
  type = "JKn",
  combined.weights = TRUE,
  rscales = rep((62 - 1) / 62, 62)
)
ratio_wgt_hgt <- svyratio(~weight, ~height, design = design_jkn)

# Extract results into a data frame
est <- coef(ratio_wgt_hgt)
se <- SE(ratio_wgt_hgt)
ci <- confint(ratio_wgt_hgt, df = 61)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)

est	se	lci	uci
0.426812	0.001247	0.424319	0.429304

Bootstrap

svy Results

rep_weights = svy.RepWeights(method=svy.EstMethod.BOOTSTRAP, prefix="bsrw", n_reps=1000)
design_bs = svy.Design(wgt="finwgt", rep_wgts=rep_weights)
sample_bs = svy.Sample(data=nmihs_bs, design=design_bs)

mean_birth_weight = sample_bs.estimation.mean(y="birthwgt", drop_nulls=True)

cols = ["est", "se", "lci", "uci"]
(
    GT(mean_birth_weight.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
3,355.452419	6.520638	3,342.656702	3,368.248137

R Results

design_bs <- svrepdesign(
  data = nmihs_bs,
  weights = ~finwgt,
  repweights = "bsrw",
  type = "bootstrap",
  replicates=1000,
  combined.weights = TRUE,
  rscales = rep((1000 - 1) / 1000, 1000)
)

mean_birth_weight <- svymean(~birthwgt, design = design_bs, na.rm = TRUE)

est <- coef(mean_birth_weight)
se <- SE(mean_birth_weight)
ci <- confint(mean_birth_weight, df = 999)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)

est	se	lci	uci
3,355.452419	6.520638	3,342.656702	3,368.248137

Successive Difference Replication (SDR)

The American Community Survey (ACS) provides 80 replicate weights constructed using successive difference replication (SDR). To illustrate SDR, we will use data from the 2024 American Community Survey (ACS) 1-Year Public Use Microdata Sample².

ACS replicate weights use SDR with 80 replicates (e.g., WGTP1–WGTP80) alongside the main weight WGTP. The ACS documentation describes the SDR replicate-weight construction and recommended variance estimation practice.

svy Results

rep_weights_acs = svy.RepWeights(method=svy.EstMethod.SDR, prefix="WGTP", n_reps=80)
design_acs = svy.Design(wgt="WGTP", rep_wgts=rep_weights_acs)
sample_acs = svy.Sample(data=acs_hak, design=design_acs)


mean_hincp = sample_acs.estimation.mean(y="HINCP", drop_nulls=True)

cols = ["est", "se", "lci", "uci"]
(
    GT(mean_hincp.to_polars().select(cols))
    .fmt_number(columns=cols, decimals=6)
)

est	se	lci	uci
111,770.504058	2,517.264331	106,760.014741	116,780.993375

R Results

R’s survey supports SDR directly via type="successive-difference". It also includes a dedicated type="ACS" shortcut that applies ACS-specific defaults. In practice, both should agree when equivalent settings are used.

design_sdr <- svrepdesign(
  data = acs_hak,
  weights = ~WGTP,
  repweights = "^WGTP[0-9]+",
  type = "successive-difference",
  scale = 4 / 80,
  combined.weights = TRUE,
  rscales = 1,
)

mean_hincp_sdr   <- svymean(~HINCP, design = design_sdr, na.rm = TRUE)

est <- coef(mean_hincp_sdr)
se <- SE(mean_hincp_sdr)
ci <- confint(mean_hincp_sdr, df = 79)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)

est	se	lci	uci
111,770.504058	2,517.264331	106,760.014741	116,780.993375

equivalently

design_acs <- svrepdesign(
  data = acs_hak,
  weights = ~WGTP,
  repweights = "^WGTP[0-9]+",
  type = "ACS",
  combined.weights = TRUE,
)

mean_hincp_acs <- svymean(~HINCP, design = design_acs, na.rm = TRUE)

est <- coef(mean_hincp_acs)
se <- SE(mean_hincp_acs)
ci <- confint(mean_hincp_acs, df = 79)

data.frame(
  est = est,
  se = se,
  lci = ci[1],
  uci = ci[2]
) |>
  gt::gt() |>
  gt::fmt_number(columns = everything(), decimals = 6)

est	se	lci	uci
111,770.504058	2,517.264331	106,760.014741	116,780.993375

Replicate Variance Calculation

By default, both svy and R survey use the average replicate estimates for calculating the estimated variance.

If, instead you want to use the full sample estimate:

Use rep_center = "estimate" with svy
Use mse = TRUE with R survey

Categorical Data Analysis

Let’s use the World Bank dataset to demonstrate categorical data analysis.

Cross-tabulation

Below, we compute the cross-tabulation of urban/rural and electricity access and show the Rao-Scott χ² test.

svy Results

crosstab = sample_str_clus.categorical.tabulate(
    rowvar="urbrur",
    colvar="electricity",
    units=svy.TableUnits.PERCENT,
)

cols = ["est", "se", "lci", "uci"]
(
    GT(crosstab.to_polars().select(["rowvar", "colvar"] + cols))
    .fmt_number(columns=cols, decimals=6)
)

rowvar	colvar	est	se	lci	uci
Rural	No	15.198691	1.165349	12.905428	17.491953
Rural	Yes	29.695594	1.309476	27.118707	32.272480
Urban	No	1.856347	0.374548	1.119282	2.593412
Urban	Yes	53.249369	0.831273	51.613526	54.885211

test_stat = crosstab.stats.pearson_adj

# Create a formatted dataframe
test_df = pl.DataFrame({
    "statistic": ["Pearson χ² (adjusted)"],
    "F_value": [test_stat.value],
    "df_num": [test_stat.df_num],
    "df_den": [test_stat.df_den],
    "p_value": [test_stat.p_value]
})

cols = ["F_value", "df_num", "df_den", "p_value"]
GT(test_df).fmt_number(columns=cols, decimals=6)

statistic	F_value	df_num	df_den	p_value
Pearson χ² (adjusted)	193.172687	1.000000	301.000000	0.000000

R Results

crosstab <- survey::svytable(~urbrur + electricity, design_str_clus, Ntotal = 100)

test_result <- summary(crosstab)

test_result$table

       electricity
urbrur         No       Yes
  Rural 15.198691 29.695594
  Urban  1.856347 53.249369

test_result$statistic


    Pearson's X^2: Rao & Scott adjustment

data:  NextMethod()
F = 193.17, ndf = 1, ddf = 301, p-value < 2.2e-16

T-tests

One group

svy Results

tot_exp_test1 = sample_str_clus.categorical.ttest(
    y="tot_exp",
    mean_h0=12500,
)

print(tot_exp_test1.to_polars().drop("y"))

shape: (1, 7)
┌─────────────┬────────────┬─────────────┬──────────┬───────────┬────────────┬──────────┐
│ diff        ┆ se         ┆ lci         ┆ uci      ┆ t         ┆ df         ┆ p_value  │
│ ---         ┆ ---        ┆ ---         ┆ ---      ┆ ---       ┆ ---        ┆ ---      │
│ f64         ┆ f64        ┆ f64         ┆ f64      ┆ f64       ┆ f64        ┆ f64      │
╞═════════════╪════════════╪═════════════╪══════════╪═══════════╪════════════╪══════════╡
│ -451.036220 ┆ 229.986492 ┆ -903.621240 ┆ 1.548800 ┆ -1.961142 ┆ 300.000000 ┆ 0.050787 │
└─────────────┴────────────┴─────────────┴──────────┴───────────┴────────────┴──────────┘

tot_exp_test1 <- svyttest((tot_exp - 12500) ~ 0, design_str_clus)

test1_df <- data.frame(
  test = "One-sample t-test",
  statistic = tot_exp_test1$statistic,
  df = tot_exp_test1$parameter,
  p_value = tot_exp_test1$p.value,
  mean_diff = tot_exp_test1$estimate,
  ci_lower = tot_exp_test1$conf.int[1],
  ci_upper = tot_exp_test1$conf.int[2]
)

test1_df |>
  gt::gt() |>
  gt::fmt_number(
    columns = c(statistic, df, mean_diff, ci_lower, ci_upper, p_value),
    decimals = 6
  )

test	statistic	df	p_value	mean_diff	ci_lower	ci_upper
One-sample t-test	−1.961142	300.000000	0.050787	−451.036220	−903.627330	1.554890

Two groups

svy Results

tot_exp_test2 = sample_str_clus.categorical.ttest(
    y="tot_exp",
    group="urbrur",
)

print(tot_exp_test2.to_polars().drop(["y", "group_var", "paired"]))

shape: (1, 7)
┌─────────────┬────────────┬─────────────┬─────────────┬───────────┬────────────┬──────────┐
│ diff        ┆ se         ┆ lci         ┆ uci         ┆ t         ┆ df         ┆ p_value  │
│ ---         ┆ ---        ┆ ---         ┆ ---         ┆ ---       ┆ ---        ┆ ---      │
│ f64         ┆ f64        ┆ f64         ┆ f64         ┆ f64       ┆ f64        ┆ f64      │
╞═════════════╪════════════╪═════════════╪═════════════╪═══════════╪════════════╪══════════╡
│ 5321.289092 ┆ 447.080293 ┆ 4441.478438 ┆ 6201.099746 ┆ 11.902312 ┆ 300.000000 ┆ 0.000000 │
└─────────────┴────────────┴─────────────┴─────────────┴───────────┴────────────┴──────────┘

R Results

tot_exp_test2 <- svyttest(tot_exp ~ urbrur, design_str_clus)

test2_df <- data.frame(
  test = "Two-sample t-test",
  statistic = tot_exp_test2$statistic,
  df = tot_exp_test2$parameter,
  p_value = tot_exp_test2$p.value,
  mean_diff = tot_exp_test2$estimate,
  ci_lower = tot_exp_test2$conf.int[1],
  ci_upper = tot_exp_test2$conf.int[2]
)

test2_df |>
  gt::gt() |>
  gt::fmt_number(
    columns = c(statistic, df, mean_diff, p_value, ci_lower, ci_upper),
    decimals = 6
  )

test	statistic	df	p_value	mean_diff	ci_lower	ci_upper
Two-sample t-test	11.902312	300.000000	0.000000	5,321.289092	4,441.478438	6,201.099746

Generalized Linear Models (GLMs)

COMING SOON!!!

Linear Regression

Logistic Regression

Poisson Regression

Comparison Summary

Category	Estimator	Design / Method	Match	Notes
Taylor	Mean	Stratified	✅
	Mean	One-stage cluster	✅
	Mean	Two-stage cluster	✅	Ultimate cluster variance
	Mean	Stratified + clustered	✅
	Proportion	Stratified + clustered	✅	Logit-transformed CIs
	Total	Stratified + clustered	✅
	Ratio	Stratified + clustered	✅
Domain	Mean	By subgroup	✅
	Ratio	By subgroup	✅
Replication	BRR	32 replicates	✅
	Jackknife	62 replicates	✅	Requires `df` specification
	Bootstrap	1000 replicates	✅	Requires `rscales` in R
	SDR	80 replicates	✅

Conclusion

This validation study demonstrates that svy reproduces the results of R’s survey package for a wide range of design-based estimators when equivalent survey designs are specified.

The agreement observed across all tested cases confirms that svy implements standard survey-sampling methodology correctly, including Taylor linearization, ultimate cluster variance estimation, and replication-based variance estimators.

These results support the use of svy for production survey analysis workflows and provide a basis for further validation of advanced features.

Community signal

Help make svy the standard for survey analysis in Python

If rigorous, design-based survey inference in Python matters to you, starring the repository helps signal demand and prioritize validation and stability work.

→ Star svy on GitHub

Notes on methodology

Two-stage variance: Both packages use ultimate cluster estimation
Proportion CIs: Both use logit-transformed confidence intervals
Stratification: svy accepts tuples for multiple variables; R requires interaction() or paste()

References

Lumley, Thomas. 2010. Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: Wiley.

World Bank. 2023. “Synthetic Data for an Imaginary Country, Sample, 2023.” World Bank, Development Data Group. https://doi.org/10.48529/MC1F-QH23.

Footnotes

All numerical comparisons were conducted using identical survey designs and variance estimators.↩︎
U.S. Census Bureau. (2023). American Community Survey 1-Year Public Use Microdata Sample [Data set]. Retrieved from https://www.census.gov/programs-surveys/acs/microdata.html↩︎

Citation

BibTeX citation:

@online{s._diallo2026,
  author = {S. Diallo, Mamadou and S. Diallo, Ph.D., Mamadou},
  title = {Validation of {Design-Based} {Survey} {Estimators} in
    {Python:} {A} {Comparison} of Svy and {R} Survey},
  date = {2026-01-10},
  url = {https://svylab.com/learn/notes/posts/svy-vs-r-comparison/},
  langid = {en}
}

For attribution, please cite this work as:

S. Diallo, Mamadou, and Mamadou S. Diallo, Ph.D. 2026. “Validation of Design-Based Survey Estimators in Python: A Comparison of Svy and R Survey.” January 10, 2026. https://svylab.com/learn/notes/posts/svy-vs-r-comparison/.