import svy
# Load survey data with design variables
smp_data = svy.io.read_csv("survey_data.csv")
# Specify survey design
smp_design = svy.Design(
stratum=("region_id", "urban_rural"), # Stratification variables
psu="psu_id", # Primary sampling units
wgt="weight", # Survey weights
)
# Create survey sample object
sample = svy.Sample(
data=smp_data,
design=smp_design
)
# Population mean with design-based standard error
mean_income = sample.estimation.mean("income")
print(mean_income)
# Output includes: estimate, SE, confidence interval, design effect
# Regression accounting for complex design
linear_model = sample.glm.fit(
y="income",
x=["age", svy.Cat("education")],
family="gaussian"
)
print(linear_model)
# Output: coefficients, design-based SEs, t-testssvy: Python Package for Complex Survey Design and Analysis
complex survey analysis, survey design Python, stratified sampling, cluster sampling, PPS sampling, survey weighting, variance estimation, Python survey package, survey statistics, sampling design
Development Status
svy is under active development with rapid improvements.
Core functionality for survey design, weighting, and variance estimation is stable and production-ready. APIs and documentation continue to mature based on user feedback.
π§ Feedback welcome: info@svylab.com
π Report issues: GitHub Issues
What is svy?
svy is a comprehensive Python package for complex survey design and analysis. When surveys use sophisticated sampling methodsβstratification, clustering, or unequal probability selectionβstandard statistical software produces incorrect standard errors and confidence intervals. svy implements design-based inference, ensuring accurate population estimates with proper variance estimation.
Why Complex Survey Analysis Matters
Large-scale surveys (health surveys, labor force studies, demographic censuses) use complex sampling for practical and statistical reasons:
- Stratified sampling increases precision by grouping similar units
- Cluster sampling reduces field costs through geographic grouping
- Multi-stage designs enable nationwide coverage efficiently
- Probability proportional to size (PPS) improves efficiency for heterogeneous populations
Standard analysis methods assume simple random sampling and produce:
- β Incorrect standard errors (usually underestimated)
- β Invalid confidence intervals
- β Wrong hypothesis test results
- β Biased population inferences
svy corrects these problems by accounting for the actual survey design in all calculations.
Core Capabilities
1. Survey Design & Planning
Design surveys with statistical rigor:
- Sample size calculation - Determine required sample sizes for target precision levels
- Power analysis - Calculate detection probability for effects of interest
- Optimal allocation - Distribute samples across strata to minimize variance or cost
- Cost-variance tradeoffs - Balance statistical precision against field expenses
2. Sample Selection
Draw probability samples using proven methods:
- Simple Random Sampling (SRS) - Equal probability, with or without replacement
- Systematic Sampling (SYS) - Ordered selection with random start point
- Probability Proportional to Size (PPS) - Selection probability tied to auxiliary variable
- Stratified sampling - Independent selection within predefined groups
- Multi-stage sampling - Hierarchical selection (PSUs β SSUs β respondents)
3. Survey Weighting
Create and calibrate survey weights:
- Design weights - Base weights from known selection probabilities
- Nonresponse adjustment - Compensate for unit and item nonresponse
- Post-stratification - Align sample margins to population control totals
- Raking (iterative proportional fitting) - Calibrate to multiple population margins simultaneously
- GREG calibration - Generalized regression estimation for efficient estimation
4. Statistical Estimation
Produce design-consistent population estimates:
- Descriptive statistics - Means, totals, proportions, quantiles with design-based variance
- Domain (subpopulation) estimation - Analyze subgroups correctly
- Ratio estimation - Ratios and their standard errors
- Regression analysis - Linear, logistic, Poisson, and other GLMs with survey weights
- Categorical data analysis - Design-adjusted chi-square tests and cross-tabulations
5. Variance Estimation
Calculate standard errors that reflect the survey design:
- Taylor linearization - Analytic variance for smooth statistics
- Bootstrap - General resampling-based variance estimation
- Balanced Repeated Replication (BRR) - Efficient replication for paired PSU designs
- Jackknife - Delete-one-group-at-a-time replication
- Domain variance - Correct standard errors for subpopulation analyses
Quick Start Example
Who Uses svy?
svy serves diverse survey research communities:
- π Survey methodologists developing sampling strategies
- π Biostatisticians analyzing health survey data (NHANES, BRFSS, DHS)
- π Social scientists studying populations through sample surveys
- ποΈ Government statisticians producing official statistics
- π¬ Epidemiologists estimating disease prevalence and risk factors
- π’ Market researchers analyzing customer surveys and panels
- π¨βπ« Educators teaching survey sampling and analysis methods
Documentation Structure
π Getting Started
Five-minute quickstart: installation, first analysis, key concepts.
π Tutorials
Hands-on walkthroughs with real data:
- Designing surveys and selecting samples
- Computing and adjusting survey weights
- Producing population estimates with proper variance
- Fitting regression models to survey data
- Analyzing cross-tabulations and categorical data
π User Guides
Coming soon - In-depth conceptual explanations and best practices
π§ API Reference
Coming soon - Complete technical documentation of all classes and methods
Community & Support
Get help and connect with other users:
- π¬ Questions & Discussion: GitHub Discussions
- π Bug Reports & Features: GitHub Issues
- π§ Direct Contact: info@svylab.com
- πΌ Professional Network: LinkedIn (svylab?)
- π Source Code: GitHub samplics-org/svy
Academic Citation
If you use svy in published research, please cite:
@software{svy2025,
title = {svy: Python Package for Complex Survey Analysis},
author = {Diallo, Mamadou S.},
year = {2025},
url = {https://github.com/samplics-org/svy},
doi = {10.5281/zenodo.XXXXXXX},
version = {0.2.0}
}License
svy is open source software released under the MIT License. See LICENSE for full terms.
Ready to analyze complex survey data correctly?
Get Started β