svy: Python Package for Complex Survey Design and Analysis

Professional Python toolkit for designing complex surveys, calculating sample sizes, weighting survey data, and performing statistical analysis with stratified, cluster, and multi-stage sampling designs.
Author

svyLab

Published

January 8, 2026

Keywords

complex survey analysis, survey design Python, stratified sampling, cluster sampling, PPS sampling, survey weighting, variance estimation, Python survey package, survey statistics, sampling design

Development Status

svy is under active development with rapid improvements.

Core functionality for survey design, weighting, and variance estimation is stable and production-ready. APIs and documentation continue to mature based on user feedback.

πŸ“§ Feedback welcome: info@svylab.com
πŸ› Report issues: GitHub Issues

What is svy?

svy is a comprehensive Python package for complex survey design and analysis. When surveys use sophisticated sampling methodsβ€”stratification, clustering, or unequal probability selectionβ€”standard statistical software produces incorrect standard errors and confidence intervals. svy implements design-based inference, ensuring accurate population estimates with proper variance estimation.

Why Complex Survey Analysis Matters

Large-scale surveys (health surveys, labor force studies, demographic censuses) use complex sampling for practical and statistical reasons:

  • Stratified sampling increases precision by grouping similar units
  • Cluster sampling reduces field costs through geographic grouping
  • Multi-stage designs enable nationwide coverage efficiently
  • Probability proportional to size (PPS) improves efficiency for heterogeneous populations

Standard analysis methods assume simple random sampling and produce:

  • ❌ Incorrect standard errors (usually underestimated)
  • ❌ Invalid confidence intervals
  • ❌ Wrong hypothesis test results
  • ❌ Biased population inferences

svy corrects these problems by accounting for the actual survey design in all calculations.

Core Capabilities

1. Survey Design & Planning

Design surveys with statistical rigor:

  • Sample size calculation - Determine required sample sizes for target precision levels
  • Power analysis - Calculate detection probability for effects of interest
  • Optimal allocation - Distribute samples across strata to minimize variance or cost
  • Cost-variance tradeoffs - Balance statistical precision against field expenses

2. Sample Selection

Draw probability samples using proven methods:

  • Simple Random Sampling (SRS) - Equal probability, with or without replacement
  • Systematic Sampling (SYS) - Ordered selection with random start point
  • Probability Proportional to Size (PPS) - Selection probability tied to auxiliary variable
  • Stratified sampling - Independent selection within predefined groups
  • Multi-stage sampling - Hierarchical selection (PSUs β†’ SSUs β†’ respondents)

3. Survey Weighting

Create and calibrate survey weights:

  • Design weights - Base weights from known selection probabilities
  • Nonresponse adjustment - Compensate for unit and item nonresponse
  • Post-stratification - Align sample margins to population control totals
  • Raking (iterative proportional fitting) - Calibrate to multiple population margins simultaneously
  • GREG calibration - Generalized regression estimation for efficient estimation

4. Statistical Estimation

Produce design-consistent population estimates:

  • Descriptive statistics - Means, totals, proportions, quantiles with design-based variance
  • Domain (subpopulation) estimation - Analyze subgroups correctly
  • Ratio estimation - Ratios and their standard errors
  • Regression analysis - Linear, logistic, Poisson, and other GLMs with survey weights
  • Categorical data analysis - Design-adjusted chi-square tests and cross-tabulations

5. Variance Estimation

Calculate standard errors that reflect the survey design:

  • Taylor linearization - Analytic variance for smooth statistics
  • Bootstrap - General resampling-based variance estimation
  • Balanced Repeated Replication (BRR) - Efficient replication for paired PSU designs
  • Jackknife - Delete-one-group-at-a-time replication
  • Domain variance - Correct standard errors for subpopulation analyses

Quick Start Example

import svy

# Load survey data with design variables
smp_data = svy.io.read_csv("survey_data.csv")

# Specify survey design
smp_design = svy.Design(
    stratum=("region_id", "urban_rural"),  # Stratification variables
    psu="psu_id",                          # Primary sampling units
    wgt="weight",                          # Survey weights
)

# Create survey sample object
sample = svy.Sample(
    data=smp_data,
    design=smp_design
)

# Population mean with design-based standard error
mean_income = sample.estimation.mean("income")
print(mean_income)
# Output includes: estimate, SE, confidence interval, design effect

# Regression accounting for complex design
linear_model = sample.glm.fit(
    y="income",
    x=["age", svy.Cat("education")],
    family="gaussian"
)
print(linear_model)
# Output: coefficients, design-based SEs, t-tests

Who Uses svy?

svy serves diverse survey research communities:

  • πŸ“Š Survey methodologists developing sampling strategies
  • πŸ“ˆ Biostatisticians analyzing health survey data (NHANES, BRFSS, DHS)
  • πŸŽ“ Social scientists studying populations through sample surveys
  • πŸ›οΈ Government statisticians producing official statistics
  • πŸ”¬ Epidemiologists estimating disease prevalence and risk factors
  • 🏒 Market researchers analyzing customer surveys and panels
  • πŸ‘¨β€πŸ« Educators teaching survey sampling and analysis methods

Documentation Structure

πŸ“– Getting Started

Five-minute quickstart: installation, first analysis, key concepts.

πŸŽ“ Tutorials

Hands-on walkthroughs with real data:

  • Designing surveys and selecting samples
  • Computing and adjusting survey weights
  • Producing population estimates with proper variance
  • Fitting regression models to survey data
  • Analyzing cross-tabulations and categorical data

πŸ“š User Guides

Coming soon - In-depth conceptual explanations and best practices

πŸ”§ API Reference

Coming soon - Complete technical documentation of all classes and methods

Community & Support

Get help and connect with other users:

Academic Citation

If you use svy in published research, please cite:

@software{svy2025,
  title = {svy: Python Package for Complex Survey Analysis},
  author = {Diallo, Mamadou S.},
  year = {2025},
  url = {https://github.com/samplics-org/svy},
  doi = {10.5281/zenodo.XXXXXXX},
  version = {0.2.0}
}

License

svy is open source software released under the MIT License. See LICENSE for full terms.


Ready to analyze complex survey data correctly?
Get Started β†’