This validation note is under active development.
All results shown are final and reproducible; remaining sections will be added incrementally.
Summary
svy produces numerically identical results to R’s survey package when equivalent survey designs and variance estimators are specified.
Both libraries implement the same design-based inferential framework, including:
Complex designs (stratification, clustering, unequal weights)
Taylor linearization and replication-based variance estimation
Logit-transformed confidence intervals for proportions
Categorical data analysis and regression
Mean
Stratified
✅
Mean
One-stage cluster
✅
Mean
Two-stage cluster (ultimate cluster)
✅
Mean
Stratified + clustered
✅
Proportion
Logit-transformed CIs
✅
Total
Stratified + clustered
✅
Ratio
Stratified + clustered
✅
Domain estimation
Mean, ratio
✅
BRR
Ratio
✅
Jackknife
Ratio
✅
Bootstrap
Mean
✅
These results validate svy as a statistically equivalent alternative to R’s survey package for complex survey analysis by (Lumley (2010 ) ).
Introduction
For decades, survey statisticians have relied on specialized software for design-based inference—tools like SAS, SPSS, Stata, and R’s survey package. These have long been the trusted workhorses for analyzing complex survey data with proper variance estimation. Now Python joins that group. With the svy library, Python users can perform rigorous, design-based analysis while staying within the modern data science ecosystem.
But can svy be trusted?
This note answers that question by comparing results from Python’s svy and R’s survey package. Using the same dataset and survey design specifications, we show that svy produces identical estimates and standard errors—validating its statistical rigor and establishing it as a reliable choice for complex survey analysis.
Validation Scope
This comparison focuses on design-based estimation , including:
Means, totals, proportions, and ratios
Domain (subpopulation) estimation
Taylor linearization variance estimation
Variance estimation for multi-stage designs
Replication-based variance estimators (BRR, jackknife, bootstrap)
Categorical data analysis (cross-tabulation, t-tests)
Regression analysis (linear, logistic, Poisson)
Loading the Datasets
Setting up R environment
library (survey)
library (srvyr)
library (gt)
library (dplyr)
library (readr)
data (api)
packageVersion ("survey" )
nhanes2brr = readr:: read_csv ("data/nhanes2brr.csv" )
nhanes2fay = readr:: read_csv ("data/nhanes2fay.csv" )
nhanes2jknife = readr:: read_csv ("data/nhanes2jknife.csv" )
nmihs_bs = readr:: read_csv ("data/nmihs_bs.csv" )
acs_hak = readr:: read_csv ("data/psam_h02.csv" )
wb_synth_smp = readr:: read_csv ("data/WLD_2023_SYNTH-SVY-HLD-EN_v01_M.csv" )
Setting up Python environment
import svy
import polars as pl
from great_tables import GT
# Set global display precision to 6 decimals
pl.Config.set_float_precision(6 )
<class 'polars.config.Config'>
apistrat = svy.io.read_csv("data/apistrat.csv" )
apiclus1 = svy.io.read_csv("data/apiclus1.csv" )
apiclus2 = svy.io.read_csv("data/apiclus2.csv" )
nhanes2brr = svy.io.read_csv("data/nhanes2brr.csv" )
nhanes2fay = svy.io.read_csv("data/nhanes2fay.csv" )
nhanes2jknife = svy.io.read_csv("data/nhanes2jknife.csv" )
nmihs_bs = svy.io.read_csv("data/nmihs_bs.csv" )
acs_hak = svy.io.read_csv("data/psam_h02.csv" )
wb_synth_smp = svy.io.read_csv("data/WLD_2023_SYNTH-SVY-HLD-EN_v01_M.csv" )
Taylor-Based Estimation
Estimating a Mean
Stratified sample
svy Results
design_str = svy.Design(stratum= "stype" , wgt= "pw" )
sample_str = svy.Sample(data= apistrat, design= design_str)
api00_mean_str = sample_str.estimation.mean("api00" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(api00_mean_str.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
662.287363
9.536132
643.481357
681.093370
R Results
design_str <- apistrat |>
srvyr:: as_survey_design (strata = stype, weights = pw)
design_str |>
summarize (
est = srvyr:: survey_mean (api00, vartype = c ("se" , "ci" ))
)|>
gt () |>
fmt_number (
columns = where (is.numeric),
decimals = 6
)
662.287363
9.536132
643.481357
681.093370
One-stage sample
svy Results
design_clus1 = svy.Design(psu= "dnum" , wgt= "pw" )
sample_clus1 = svy.Sample(data= apiclus1, design= design_clus1)
api00_mean_clus1 = sample_clus1.estimation.mean("api00" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(api00_mean_clus1.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
644.169399
23.779011
593.168493
695.170305
R Results
design_clus1 <- apiclus1 |>
srvyr:: as_survey_design (id = dnum, weights = pw)
design_clus1 |>
dplyr:: summarize (
est = srvyr:: survey_mean (api00, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
644.169399
23.779011
593.168493
695.170305
Two-stage sample
svy Results
# Two-stage design: districts (dnum) → schools (snum)
# Note: ssu is specified for documentation, but variance uses ultimate cluster
design_clus2 = svy.Design(psu= "dnum" , wgt= "pw" )
sample_clus2 = svy.Sample(data= apiclus2, design= design_clus2)
api00_mean_clus2 = sample_clus2.estimation.mean("api00" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(api00_mean_clus2.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
670.811808
30.711576
608.691782
732.931835
R Results
design_clus2 <- apiclus2 |>
srvyr:: as_survey_design (id = c (dnum, snum), weights = pw)
design_clus2 |>
dplyr:: summarize (
est = srvyr:: survey_mean (api00, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
670.811808
30.711576
608.691782
732.931835
For multi-stage designs, svy uses the ultimate cluster variance estimator , which approximates total variance using first-stage (PSU) variability only. This approach is standard in survey software (including R’s survey package) because it:
Produces conservative variance estimates
Avoids requiring population sizes at lower stages
Reflects the dominant source of variability in most designs
Accordingly, specifying Design(psu="dnum", ssu="snum") yields the same variance estimates as Design(psu="dnum").
Stratified clustered sample
For this example, we will use The World Bank Synthetic Survey data (World Bank 2023 ) .
svy Results
design_str_clus = svy.Design(stratum= ("geo1" , "urbrur" ), psu= "ea" , wgt= "hhweight" )
sample_str_clus = svy.Sample(data= wb_synth_smp, design= design_str_clus)
tot_exp = sample_str_clus.estimation.mean("tot_exp" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(tot_exp.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
12,048.963780
229.986492
11,596.378760
12,501.548800
R Results
design_str_clus <- wb_synth_smp |>
dplyr:: mutate (stratum = paste (geo1, urbrur, sep = "_" )) |>
srvyr:: as_survey_design (id = ea, strata = stratum, weights = hhweight)
design_str_clus |>
dplyr:: summarize (
est = srvyr:: survey_mean (tot_exp, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
12,048.963780
229.986492
11,596.378760
12,501.548800
Other Population Parameters
Proportion
svy Results
electricity = sample_str_clus.estimation.prop("electricity" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(electricity.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
0.170550
0.011873
0.148438
0.195201
0.829450
0.011873
0.804799
0.851562
R Results
design_str_clus |>
dplyr:: group_by (electricity) |>
dplyr:: summarize (
est = srvyr:: survey_prop (vartype = c ("se" , "ci" ), proportion = TRUE )
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
No
0.170550
0.011873
0.148438
0.195201
Yes
0.829450
0.011873
0.804799
0.851562
total
sample_str_clus = sample_str_clus.wrangling.recode(
cols= "electricity" ,
recodes= {1 : ["No" ], 0 : ["Yes" ]},
into = "no_electricity"
)
electricity = sample_str_clus.estimation.total("no_electricity" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(electricity.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
426,675.251960
30,622.991644
366,412.985386
486,937.518534
R Results
design_str_clus |>
dplyr:: mutate (no_electricity = electricity != "Yes" ) |>
dplyr:: summarize (
est = srvyr:: survey_total (no_electricity, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
426,675.251960
30,622.991644
366,412.985386
486,937.518534
Ratio
svy Results
tot_exp = sample_str_clus.estimation.ratio(y= "tot_exp" , x= "hhsize" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(tot_exp.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
2,992.110041
71.224260
2,851.949491
3,132.270590
R Results
design_str_clus <- wb_synth_smp |>
dplyr:: mutate (stratum = paste (geo1, urbrur, sep = "_" )) |>
srvyr:: as_survey_design (id = ea, strata = stratum, weights = hhweight)
design_str_clus |>
dplyr:: summarize (
est = srvyr:: survey_ratio (tot_exp, hhsize, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
2,992.110041
71.224260
2,851.949491
3,132.270590
Domain estimation
Average expenditure by urban and rural areas
svy Results
tot_exp = sample_str_clus.estimation.mean(y= "tot_exp" , by= "urbrur" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(tot_exp.to_polars().select(["by_level" ]+ cols))
.fmt_number(columns= cols, decimals= 6 )
)
['Urban']
14,437.918429
326.402120
13,795.599356
15,080.237501
['Rural']
9,116.629337
305.519957
8,515.403784
9,717.854889
R Results
design_str_clus <- wb_synth_smp |>
dplyr:: mutate (stratum = paste (geo1, urbrur, sep = "_" )) |>
srvyr:: as_survey_design (id = ea, strata = stratum, weights = hhweight)
design_str_clus |>
dplyr:: group_by (urbrur) |>
dplyr:: summarize (
est = srvyr:: survey_mean (tot_exp, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
Rural
9,116.629337
305.519957
8,515.403784
9,717.854889
Urban
14,437.918429
326.402120
13,795.599356
15,080.237501
Ratio of expenditure over household size by banking status
svy Results
tot_exp = sample_str_clus.estimation.ratio(y= "tot_exp" , x= "hhsize" , by= "bank" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(tot_exp.to_polars().select(["by_level" ]+ cols))
.fmt_number(columns= cols, decimals= 6 )
)
['No']
1,784.529865
41.958370
1,701.960973
1,867.098757
['Yes']
3,960.323902
89.247824
3,784.695203
4,135.952601
R Results
design_str_clus <- wb_synth_smp |>
dplyr:: mutate (stratum = paste (geo1, urbrur, sep = "_" )) |>
srvyr:: as_survey_design (id = ea, strata = stratum, weights = hhweight)
design_str_clus |>
dplyr:: group_by (bank) |>
dplyr:: summarize (
est = srvyr:: survey_ratio (tot_exp, hhsize, vartype = c ("se" , "ci" ))
)|>
gt:: gt () |>
gt:: fmt_number (
columns = where (is.numeric),
decimals = 6
)
No
1,784.529865
41.958370
1,701.960973
1,867.098757
Yes
3,960.323902
89.247824
3,784.695203
4,135.952601
Replication-Based estimation
When only replicate weights are provided (without strata/PSU identifiers), the true design df is unknown:
svy defaults to df = n_reps - 1
R defaults to the rank of the replicate weight matrix minus 1
Both approaches are heuristics. The rank-based method can detect when post-stratification or calibration has reduced the effective df, but is computationally expensive and numerically sensitive.
Both packages allow user override: RepWeights(df=...) in svy, degf= in R’s svrepdesign().
In practice, data providers typically document the correct degrees of freedom for their replicate weights (e.g., NHANES, ACS). Always consult the survey documentation and specify df explicitly when known.
Balanced Repeated Replication (BRR)
svy Results
rep_weights = svy.RepWeights(method= svy.EstMethod.BRR, prefix= "brr_" , n_reps= 32 )
design_brr = svy.Design(wgt= "finalwgt" , rep_wgts= rep_weights)
sample_brr = svy.Sample(data= nhanes2brr, design= design_brr)
ratio_wgt_hgt = sample_brr.estimation.ratio(y= "weight" , x= "height" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(ratio_wgt_hgt.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
0.426812
0.000890
0.424996
0.428628
R Results
design_brr <- svrepdesign (
data = nhanes2brr,
weights = ~ finalwgt,
repweights = "brr_" ,
type = "BRR" ,
combined.weights = TRUE
)
ratio_wgt_hgt <- svyratio (~ weight, ~ height, design = design_brr)
# Extract results into a data frame
est <- coef (ratio_wgt_hgt)
se <- SE (ratio_wgt_hgt)
ci <- confint (ratio_wgt_hgt, df = degf (design_brr))
data.frame (
est = est,
se = se,
lci = ci[1 ],
uci = ci[2 ]
) |>
gt:: gt () |>
gt:: fmt_number (columns = everything (), decimals = 6 )
0.426812
0.000890
0.424996
0.428628
R’s confint() uses normal (z) approximation by default for replicate weight designs.
To match svy’s t-based CIs, use confint(..., df = degf(design)).
Jackknife
svy Results
rep_weights = svy.RepWeights(
method= svy.EstMethod.JACKKNIFE, prefix= "jkw_" , n_reps= 62 , df= 61
)
design_jkn = svy.Design(wgt= "finalwgt" , rep_wgts= rep_weights)
sample_jkn = svy.Sample(data= nhanes2jknife, design= design_jkn)
ratio_wgt_hgt = sample_jkn.estimation.ratio(y= "weight" , x= "height" )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(ratio_wgt_hgt.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
0.426812
0.001247
0.424319
0.429304
R Results
design_jkn <- svrepdesign (
data = nhanes2jknife,
weights = ~ finalwgt,
repweights = "jkw_" ,
type = "JKn" ,
combined.weights = TRUE ,
rscales = rep ((62 - 1 ) / 62 , 62 )
)
ratio_wgt_hgt <- svyratio (~ weight, ~ height, design = design_jkn)
# Extract results into a data frame
est <- coef (ratio_wgt_hgt)
se <- SE (ratio_wgt_hgt)
ci <- confint (ratio_wgt_hgt, df = 61 )
data.frame (
est = est,
se = se,
lci = ci[1 ],
uci = ci[2 ]
) |>
gt:: gt () |>
gt:: fmt_number (columns = everything (), decimals = 6 )
0.426812
0.001247
0.424319
0.429304
Bootstrap
svy Results
rep_weights = svy.RepWeights(method= svy.EstMethod.BOOTSTRAP, prefix= "bsrw" , n_reps= 1000 )
design_bs = svy.Design(wgt= "finwgt" , rep_wgts= rep_weights)
sample_bs = svy.Sample(data= nmihs_bs, design= design_bs)
mean_birth_weight = sample_bs.estimation.mean(y= "birthwgt" , drop_nulls= True )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(mean_birth_weight.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
3,355.452419
6.520638
3,342.656702
3,368.248137
R Results
design_bs <- svrepdesign (
data = nmihs_bs,
weights = ~ finwgt,
repweights = "bsrw" ,
type = "bootstrap" ,
replicates= 1000 ,
combined.weights = TRUE ,
rscales = rep ((1000 - 1 ) / 1000 , 1000 )
)
mean_birth_weight <- svymean (~ birthwgt, design = design_bs, na.rm = TRUE )
est <- coef (mean_birth_weight)
se <- SE (mean_birth_weight)
ci <- confint (mean_birth_weight, df = 999 )
data.frame (
est = est,
se = se,
lci = ci[1 ],
uci = ci[2 ]
) |>
gt:: gt () |>
gt:: fmt_number (columns = everything (), decimals = 6 )
3,355.452419
6.520638
3,342.656702
3,368.248137
Successive Difference Replication (SDR)
The American Community Survey (ACS) provides 80 replicate weights constructed using successive difference replication (SDR) . To illustrate SDR, we will use data from the 2024 American Community Survey (ACS) 1-Year Public Use Microdata Sample.
ACS replicate weights use SDR with 80 replicates (e.g., WGTP1–WGTP80) alongside the main weight WGTP. The ACS documentation describes the SDR replicate-weight construction and recommended variance estimation practice.
svy Results
rep_weights_acs = svy.RepWeights(method= svy.EstMethod.SDR, prefix= "WGTP" , n_reps= 80 )
design_acs = svy.Design(wgt= "WGTP" , rep_wgts= rep_weights_acs)
sample_acs = svy.Sample(data= acs_hak, design= design_acs)
mean_hincp = sample_acs.estimation.mean(y= "HINCP" , drop_nulls= True )
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(mean_hincp.to_polars().select(cols))
.fmt_number(columns= cols, decimals= 6 )
)
111,770.504058
2,517.264331
106,760.014741
116,780.993375
R Results
R’s survey supports SDR directly via type="successive-difference". It also includes a dedicated type="ACS" shortcut that applies ACS-specific defaults. In practice, both should agree when equivalent settings are used.
design_sdr <- svrepdesign (
data = acs_hak,
weights = ~ WGTP,
repweights = "^WGTP[0-9]+" ,
type = "successive-difference" ,
scale = 4 / 80 ,
combined.weights = TRUE ,
rscales = 1 ,
)
mean_hincp_sdr <- svymean (~ HINCP, design = design_sdr, na.rm = TRUE )
est <- coef (mean_hincp_sdr)
se <- SE (mean_hincp_sdr)
ci <- confint (mean_hincp_sdr, df = 79 )
data.frame (
est = est,
se = se,
lci = ci[1 ],
uci = ci[2 ]
) |>
gt:: gt () |>
gt:: fmt_number (columns = everything (), decimals = 6 )
111,770.504058
2,517.264331
106,760.014741
116,780.993375
equivalently
design_acs <- svrepdesign (
data = acs_hak,
weights = ~ WGTP,
repweights = "^WGTP[0-9]+" ,
type = "ACS" ,
combined.weights = TRUE ,
)
mean_hincp_acs <- svymean (~ HINCP, design = design_acs, na.rm = TRUE )
est <- coef (mean_hincp_acs)
se <- SE (mean_hincp_acs)
ci <- confint (mean_hincp_acs, df = 79 )
data.frame (
est = est,
se = se,
lci = ci[1 ],
uci = ci[2 ]
) |>
gt:: gt () |>
gt:: fmt_number (columns = everything (), decimals = 6 )
111,770.504058
2,517.264331
106,760.014741
116,780.993375
By default, both svy and R survey use the average replicate estimates for calculating the estimated variance.
If, instead you want to use the full sample estimate:
Use rep_center = "estimate" with svy
Use mse = TRUE with R survey
Categorical Data Analysis
Let’s use the World Bank dataset to demonstrate categorical data analysis.
Below, we compute the cross-tabulation of urban/rural and electricity access and show the Rao-Scott χ² test.
svy Results
crosstab = sample_str_clus.categorical.tabulate(
rowvar= "urbrur" ,
colvar= "electricity" ,
units= svy.TableUnits.PERCENT,
)
cols = ["est" , "se" , "lci" , "uci" ]
(
GT(crosstab.to_polars().select(["rowvar" , "colvar" ] + cols))
.fmt_number(columns= cols, decimals= 6 )
)
Rural
No
15.198691
1.165349
12.905428
17.491953
Rural
Yes
29.695594
1.309476
27.118707
32.272480
Urban
No
1.856347
0.374548
1.119282
2.593412
Urban
Yes
53.249369
0.831273
51.613526
54.885211
test_stat = crosstab.stats.pearson_adj
# Create a formatted dataframe
test_df = pl.DataFrame({
"statistic" : ["Pearson χ² (adjusted)" ],
"F_value" : [test_stat.value],
"df_num" : [test_stat.df_num],
"df_den" : [test_stat.df_den],
"p_value" : [test_stat.p_value]
})
cols = ["F_value" , "df_num" , "df_den" , "p_value" ]
GT(test_df).fmt_number(columns= cols, decimals= 6 )
Pearson χ² (adjusted)
193.172687
1.000000
301.000000
0.000000
R Results
crosstab <- survey:: svytable (~ urbrur + electricity, design_str_clus, Ntotal = 100 )
test_result <- summary (crosstab)
test_result$ table
electricity
urbrur No Yes
Rural 15.198691 29.695594
Urban 1.856347 53.249369
Pearson's X^2: Rao & Scott adjustment
data: NextMethod()
F = 193.17, ndf = 1, ddf = 301, p-value < 2.2e-16
T-tests
One group
svy Results
tot_exp_test1 = sample_str_clus.categorical.ttest(
y= "tot_exp" ,
mean_h0= 12500 ,
)
print (tot_exp_test1.to_polars().drop("y" ))
shape: (1, 7)
┌─────────────┬────────────┬─────────────┬──────────┬───────────┬────────────┬──────────┐
│ diff ┆ se ┆ lci ┆ uci ┆ t ┆ df ┆ p_value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════╪════════════╪═════════════╪══════════╪═══════════╪════════════╪══════════╡
│ -451.036220 ┆ 229.986492 ┆ -903.621240 ┆ 1.548800 ┆ -1.961142 ┆ 300.000000 ┆ 0.050787 │
└─────────────┴────────────┴─────────────┴──────────┴───────────┴────────────┴──────────┘
tot_exp_test1 <- svyttest ((tot_exp - 12500 ) ~ 0 , design_str_clus)
test1_df <- data.frame (
test = "One-sample t-test" ,
statistic = tot_exp_test1$ statistic,
df = tot_exp_test1$ parameter,
p_value = tot_exp_test1$ p.value,
mean_diff = tot_exp_test1$ estimate,
ci_lower = tot_exp_test1$ conf.int[1 ],
ci_upper = tot_exp_test1$ conf.int[2 ]
)
test1_df |>
gt:: gt () |>
gt:: fmt_number (
columns = c (statistic, df, mean_diff, ci_lower, ci_upper, p_value),
decimals = 6
)
One-sample t-test
−1.961142
300.000000
0.050787
−451.036220
−903.627330
1.554890
Two groups
svy Results
tot_exp_test2 = sample_str_clus.categorical.ttest(
y= "tot_exp" ,
group= "urbrur" ,
)
print (tot_exp_test2.to_polars().drop(["y" , "group_var" , "paired" ]))
shape: (1, 7)
┌─────────────┬────────────┬─────────────┬─────────────┬───────────┬────────────┬──────────┐
│ diff ┆ se ┆ lci ┆ uci ┆ t ┆ df ┆ p_value │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════════╪════════════╪═════════════╪═════════════╪═══════════╪════════════╪══════════╡
│ 5321.289092 ┆ 447.080293 ┆ 4441.478438 ┆ 6201.099746 ┆ 11.902312 ┆ 300.000000 ┆ 0.000000 │
└─────────────┴────────────┴─────────────┴─────────────┴───────────┴────────────┴──────────┘
R Results
tot_exp_test2 <- svyttest (tot_exp ~ urbrur, design_str_clus)
test2_df <- data.frame (
test = "Two-sample t-test" ,
statistic = tot_exp_test2$ statistic,
df = tot_exp_test2$ parameter,
p_value = tot_exp_test2$ p.value,
mean_diff = tot_exp_test2$ estimate,
ci_lower = tot_exp_test2$ conf.int[1 ],
ci_upper = tot_exp_test2$ conf.int[2 ]
)
test2_df |>
gt:: gt () |>
gt:: fmt_number (
columns = c (statistic, df, mean_diff, p_value, ci_lower, ci_upper),
decimals = 6
)
Two-sample t-test
11.902312
300.000000
0.000000
5,321.289092
4,441.478438
6,201.099746
Generalized Linear Models (GLMs)
Comparison Summary
Taylor
Mean
Stratified
✅
Mean
One-stage cluster
✅
Mean
Two-stage cluster
✅
Ultimate cluster variance
Mean
Stratified + clustered
✅
Proportion
Stratified + clustered
✅
Logit-transformed CIs
Total
Stratified + clustered
✅
Ratio
Stratified + clustered
✅
Domain
Mean
By subgroup
✅
Ratio
By subgroup
✅
Replication
BRR
32 replicates
✅
Jackknife
62 replicates
✅
Requires df specification
Bootstrap
1000 replicates
✅
Requires rscales in R
SDR
80 replicates
✅
Conclusion
This validation study demonstrates that svy reproduces the results of R’s survey package for a wide range of design-based estimators when equivalent survey designs are specified.
The agreement observed across all tested cases confirms that svy implements standard survey-sampling methodology correctly, including Taylor linearization, ultimate cluster variance estimation, and replication-based variance estimators.
These results support the use of svy for production survey analysis workflows and provide a basis for further validation of advanced features.
Help make svy the standard for survey analysis in Python
If rigorous, design-based survey inference in Python matters to you, starring the repository helps signal demand and prioritize validation and stability work.
→ Star svy on GitHub
Notes on methodology
Two-stage variance : Both packages use ultimate cluster estimation
Proportion CIs : Both use logit-transformed confidence intervals
Stratification : svy accepts tuples for multiple variables; R requires interaction() or paste()
Back to topReferences
Lumley, Thomas. 2010. Complex Surveys: A Guide to Analysis Using R . Hoboken, NJ: Wiley.
World Bank. 2023.
“Synthetic Data for an Imaginary Country, Sample, 2023.” World Bank, Development Data Group.
https://doi.org/10.48529/MC1F-QH23 .
Citation BibTeX citation:
@online{s._diallo2026,
author = {S. Diallo, Mamadou and S. Diallo, Ph.D., Mamadou},
title = {Validation of {Design-Based} {Survey} {Estimators} in
{Python:} {A} {Comparison} of Svy and {R} Survey},
date = {2026-01-10},
url = {https://svylab.com/learn/notes/posts/svy-vs-r-comparison/},
langid = {en}
}
For attribution, please cite this work as:
S. Diallo, Mamadou, and Mamadou S. Diallo, Ph.D. 2026.
“Validation
of Design-Based Survey Estimators in Python: A Comparison of Svy and R
Survey.” January 10, 2026.
https://svylab.com/learn/notes/posts/svy-vs-r-comparison/ .