import svy
# Load survey data
data = svy.io.read_csv("survey_data.csv")
# Preview
print(data.head())Getting Started with svy
svy getting started, svy tutorial, survey analysis Python, complex survey Python, svy quickstart
Get up and running with svy in under 10 minutes. This guide walks you through installation, your first analysis, and key concepts.
Installation
Install svy from PyPI using pip:
pip install svyFor enhanced output formatting and tables:
pip install "svy[report]"For the fastest installation experience, use uv:
uv add "svy[report]"System Requirements: - Python 3.11, 3.12, or 3.13 - Works on Linux, macOS, and Windows
Your First Analysis
Let’s analyze a simple survey with stratified sampling.
Step 1: Load Data
Expected columns: - income - Survey outcome variable - age, education - Covariates - stratum - Stratification variable - psu - Primary sampling unit (cluster) - weight - Survey weight
Step 2: Define Survey Design
Tell svy about your sampling design:
# Specify design variables
design = svy.Design(
stratum="stratum", # Stratification
psu="psu", # Clustering
wgt="weight" # Survey weights
)
# Create survey sample object
sample = svy.Sample(data=data, design=design)
print(sample)Step 3: Calculate Population Estimates
Produce design-consistent estimates:
# Population mean with proper standard error
mean_income = sample.estimation.mean("income")
print(mean_income)
# Output includes:
# - Estimate
# - Standard error
# - 95% confidence interval
# - Design effect (DEFF)Step 4: Analyze Subpopulations
Estimate for domains (subgroups):
# Mean income by education level
by_education = sample.estimation.mean("income", by="education")
print(by_education)Step 5: Fit Regression Models
Survey-weighted regression:
# Linear regression accounting for design
model = sample.glm.fit(
y="income",
x=["age", svy.Cat("education")],
family="gaussian"
)
print(model)That’s it! You’ve completed your first survey analysis with proper design-based inference.
Core Concepts
Survey Design
The Design object specifies how your sample was selected:
# Simple random sample
design_srs = svy.Design(wgt="weight")
# Stratified sample
design_strat = svy.Design(
stratum="region",
wgt="weight"
)
# Clustered sample
design_cluster = svy.Design(
psu="school_id",
wgt="weight"
)
# Complex multi-stage design
design_complex = svy.Design(
stratum=("region", "urban_rural"), # Multiple strata
psu="psu_id", # Primary units
wgt="final_weight" # Combined weight
)Sample Object
The Sample combines your data with design information:
sample = svy.Sample(data=data, design=design)
# Access components
sample.data # Survey data (Polars DataFrame)
sample.design # Design specification
sample.estimation # Estimation methods
sample.glm # Regression models
sample.wrangling # Data transformation utilitiesData Wrangling
Clean and transform survey data efficiently:
from svy import CaseStyle, LetterCase
from svy.core.expr import col
# Standardize column names
sample = sample.wrangling.clean_names(
case_style=CaseStyle.SNAKE,
letter_case=LetterCase.LOWER
)
# Recode categorical variables
sample = sample.wrangling.recode(
"education",
{"High School": ["HS", "high_school"],
"College": ["BA", "BS", "college"]}
)
# Bin continuous variables into categories
sample = sample.wrangling.categorize(
"age",
bins=[0, 18, 35, 65, 100],
labels=["0-17", "18-34", "35-64", "65+"]
)
# Cap extreme values (top/bottom coding)
sample = sample.wrangling.bottom_and_top_code(
{"income": (0, 200000)} # Cap at 0 and 200k
)
# Create new variables
sample = sample.wrangling.mutate({
"income_thousands": col("income") / 1000,
"age_squared": col("age") ** 2
})Common tasks:
clean_names()- Standardize column names (snake_case, camelCase, etc.)recode()- Map values to new categoriescategorize()- Bin continuous variables into labeled rangestop_code()/bottom_code()- Cap extreme valuesmutate()- Create or transform variables with expressions
See: Wrangling Tutorial for comprehensive examples.
Estimation Methods
All estimation methods produce design-consistent results:
# Means
sample.estimation.mean("income")
# Totals
sample.estimation.total("population")
# Proportions
sample.estimation.prop("employed")
# Median
sample.estimation.median("income")
# Ratios
sample.estimation.ratio(y="expenditure", x="hh_size") # x is the denominatorWorking with Real Survey Data
Loading Common Formats
svy integrates with svy-io for reading survey data files:
# SPSS files
data = svy.io.read_spss("survey.sav")
# Stata files
data = svy.io.read_stata("survey.dta")
# SAS files
data = svy.io.read_sas("survey.sas7bdat")
# CSV with metadata
data = svy.io.read_csv("survey.csv", metadata="survey_metadata.json")Handling Missing Data
svy respects survey-specific missing values:
sample.estimation.mean("income", drop_nulls=True)Complete Example: NHANES Analysis
Analyze real public health survey data:
import svy
import polars as pl
# Load NHANES data
nhanes = pl.read_csv("nhanes_demo.csv")
# Define complex design
design = svy.Design(
stratum="sdmvstra", # Stratification variable
psu="sdmvpsu", # Primary sampling unit
wgt="wtmec2yr" # 2-year weights
)
# Create sample
sample = svy.Sample(data=nhanes, design=design)
# Population mean BMI
mean_bmi = sample.estimation.mean("bmxbmi")
print(mean_bmi)
# BMI by gender
by_gender = sample.estimation.mean("bmxbmi", by="riagendr")
print(by_gender)
# Regression: BMI ~ age + gender
model = sample.glm.fit(
y="bmxbmi",
x=["ridageyr", svy.Cat("riagendr")],
family="gaussian"
)
print(model)Key Resources
- 📘 Documentation: svylab.com/docs/svy
- 💬 Community: GitHub Discussions
- 🐛 Issues: GitHub Issues
- 📧 Support: info@svylab.com
Dependencies
svy builds on Python’s scientific computing ecosystem:
Core Dependencies (installed automatically):
- NumPy - Numerical computing
- Polars - Fast DataFrames
- PyArrow - Columnar data
- SciPy - Scientific algorithms
- msgspec - Fast serialization
Optional Dependencies:
- rich - Enhanced console output
- great-tables - Publication tables
- svy-sae - Small area estimation
Install with optional dependencies:
pip install "svy[report]" # Adds rich + great-tables
pip install "svy[sae]" # Adds svy-sae
pip install "svy[all]" # EverythingContributing
svy is open source and welcomes contributions!
Ways to contribute:
- 🐛 Report bugs or request features
- 📖 Improve documentation
- 🧪 Add examples or tutorials
- 💻 Submit code improvements
See our GitHub repository to get started.
Support the Project
Help svy grow:
- ⭐ Star the GitHub repo
- 📣 Share with colleagues and on social media
- 📝 Cite in your publications (see Citation)
- 💼 Connect on LinkedIn
License
svy is released under the MIT License - free for academic and commercial use.
Ready to Explore?
Continue with hands-on tutorials covering real survey analysis workflows.