Quick Tour: The svy Sample Object in Python

5-minute introduction to data exploration, filtering, and summaries

Tutorials
Getting Started
Sample Object
Python
Learn the svy Sample object - your central interface for survey data exploration, filtering, summaries, and analysis. Master data inspection, weighted summaries, and immutable transformations.
Author

Mamadou S. Diallo, Ph.D.

Published

January 18, 2026

Modified

January 18, 2026

Keywords

svy Sample object, survey data exploration Python, svy tutorial quickstart, inspect survey data Python, survey data filtering, weighted summary statistics, Polars DataFrame survey, svy Design object, survey data wrangling, immutable data objects, show_data svy, describe survey data

5-minute introduction to Sample—the core object you’ll use throughout these tutorials.

What is Sample?

Sample wraps your survey data (a Polars DataFrame) with design information, providing a unified interface for data exploration, wrangling, weighting, and estimation.

Think of Sample as:

  • Your survey dataset + design metadata
  • A gateway to all svy functionality
  • Immutable by default (transformations return new Sample objects)
import polars as pl
import svy

# Create sample data
df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "region": ["North", "South", "North", "East", "South"],
    "age": [22, 47, 35, 61, 29],
    "income": [45000, 62000, 51000, 78000, 43000],
    "weight": [1.0, 1.2, 0.9, 1.1, 0.8],
})

# Define survey design
design = svy.Design(wgt="weight", stratum="region")

# Create Sample object
sample = svy.Sample(df, design=design)

print(sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 5                                           
   Number of columns: 7                                        
   Number of strata: 3                                         
   Number of PSUs: None                                        
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          
    Row index           svy_row_index                          
    Stratum             region                                 
    PSU                 None                                   
    SSU                 None                                   
    Weight              weight                                 
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   None                                   
                                                               
╰───────────────────────────────────────────────────────────────╯

Quick Data Inspection

Preview Data

# First 3 rows
sample.show_data(how="head", n=3)

# Specific columns only
sample.show_data(columns=["id", "region", "age"], how="head", n=3)

# Last 2 rows, sorted by age
sample.show_data(how="tail", n=2, sort_by="age", descending=True)

# Random sample (reproducible with seed)
sample.show_data(how="sample", n=3, rstate=42)
shape: (3, 6)
svy_row_index id region age income weight
u32 i64 str i64 i64 f64
2 3 "North" 35 51000 0.9
1 2 "South" 47 62000 1.2
4 5 "South" 29 43000 0.8

Filter Records

# Filter by values (dictionary syntax)
sample.show_records(
    where={"region": ["North", "East"]},
    columns=["id", "region", "age"]
)

# Filter with expressions
from svy.core.expr import col

sample.show_records(
    where=[col("age") > 30, col("region") == "South"],
    order_by="income",
    descending=True
)
shape: (1, 6)
svy_row_index id region age income weight
u32 i64 str i64 i64 f64
1 2 "South" 47 62000 1.2

Sample Properties

Access key information about your sample:

print(f"Number of records: {sample.n_records}\n")
print(f"Number of columns: {sample.n_columns}\n")
print(f"Number of strata: {sample.n_strata}\n")
print(f"Number of psus: {sample.n_psus}\n")

print(f"Strata: {sample.strata}")

# Access underlying data (defensive copy)
df_copy = sample.data
print(df_copy.head())

# Access design
design_copy = sample.design
print(design_copy)
Number of records: 5

Number of columns: 7

Number of strata: 3

Number of psus: 0

Strata: shape: (3, 1)
┌────────┐
│ region │
│ ---    │
│ str    │
╞════════╡
│ East   │
│ North  │
│ South  │
└────────┘
shape: (5, 6)
┌───────────────┬─────┬────────┬─────┬────────┬────────┐
│ svy_row_index ┆ id  ┆ region ┆ age ┆ income ┆ weight │
│ ---           ┆ --- ┆ ---    ┆ --- ┆ ---    ┆ ---    │
│ u32           ┆ i64 ┆ str    ┆ i64 ┆ i64    ┆ f64    │
╞═══════════════╪═════╪════════╪═════╪════════╪════════╡
│ 0             ┆ 1   ┆ North  ┆ 22  ┆ 45000  ┆ 1.0    │
│ 1             ┆ 2   ┆ South  ┆ 47  ┆ 62000  ┆ 1.2    │
│ 2             ┆ 3   ┆ North  ┆ 35  ┆ 51000  ┆ 0.9    │
│ 3             ┆ 4   ┆ East   ┆ 61  ┆ 78000  ┆ 1.1    │
│ 4             ┆ 5   ┆ South  ┆ 29  ┆ 43000  ┆ 0.8    │
└───────────────┴─────┴────────┴─────┴────────┴────────┘
╭───────────── Design ──────────────╮
 Field               Value         
 ───────────────────────────────── 
 Row index           svy_row_index 
 Stratum             region        
 PSU                 None          
 SSU                 None          
 Weight              weight        
 With replacement    False         
 Prob                None          
 Hit                 None          
 MOS                 None          
 Population size     None          
 Replicate weights   None          
╰───────────────────────────────────╯

Note: sample.data and sample.design return defensive copies—safe to inspect without modifying the original Sample.

Data Summaries

Unweighted Summary

# Quick statistical summary
summary = sample.describe()
print(summary)
╭────────────────────────────────────── Describe ──────────────────────────────────────╮
 Columns: 5                                                                           
 Weighted: False                                                                      
 drop_nulls: True                                                                     
 percentiles: (0.05, 0.25, 0.5, 0.75, 0.95)                                           
 generated_at: 2026-02-09T15:09:16+00:00                                              
                                                                                      
 Numeric                                                                              
                                                                                      
   name    type           mis   mea       std   min    p25   p50    p75   max     sum 
   ────────────────────────────────────────────────────────────────────────────────── 
   id      Discrete         0     3   1.58114     1      2     3      4     5      15 
   age     Discrete         0   38.   15.4337    22     29    35     47    61     194 
   inco…   Discrete         0   558   14446.5   430   4500   510   6200   780   27900 
   weig…   Continuo…        0     1   0.15811   0.8    0.9     1    1.1   1.2       5 
                                                                                      
 Categorical                                                                          
                                                                                      
   name     type      n   miss   levels   mode    top                                 
   ────────────────────────────────────────────────────────────────                   
   region   nominal   5      0        3   South   South: 2 (40.00%)                   
                                                  North: 2 (40.00%)                   
                                                  East: 1 (20.00%)                    
╰──────────────────────────────────────────────────────────────────────────────────────╯

Output includes: - Count, missing values - Mean, standard deviation - Min, quartiles, max - For categorical: top categories and frequencies

Weighted Summary

# Uses design weights if available
weighted_summary = sample.describe(weighted=True)
print(weighted_summary)
╭────────────────────────────────────── Describe ──────────────────────────────────────╮
 Columns: 5                                                                           
 Weighted: True (weight_col=weight)                                                   
 drop_nulls: True                                                                     
 percentiles: (0.05, 0.25, 0.5, 0.75, 0.95)                                           
 generated_at: 2026-02-09T15:09:16+00:00                                              
                                                                                      
 Numeric                                                                              
                                                                                      
   name    type           mis   mea       std   min    p25   p50    p75   max     sum 
   ────────────────────────────────────────────────────────────────────────────────── 
   id      Discrete         0   2.9   1.58114     1      2     3      4     5    14.5 
   age     Discrete         0   40.   15.4337    22     29    35     47    61   200.2 
   inco…   Discrete         0   571   14446.5   430   4500   510   6200   780   28550 
   weig…   Continuo…        0   1.0   0.15811   0.8    0.9     1    1.1   1.2     5.1 
                                                                                      
 Categorical                                                                          
                                                                                      
   name     type      n   miss   levels   mode    top                                 
   ──────────────────────────────────────────────────────────────────                 
   region   nominal   5      0        3   South   South: 2 (40.00%)                   
                                                  North: 1.9 (38.00%)                 
                                                  East: 1.1 (22.00%)                  
╰──────────────────────────────────────────────────────────────────────────────────────╯

Weighted summaries account for sampling design, producing population-representative statistics.

Sample is Immutable

Transformations return new Sample objects:

# Original sample unchanged
original = sample

# Wrangling creates new sample
cleaned = sample.wrangling.clean_names()

# Original still exists
print(f"Original columns: {original.data.columns}")
print(f"Cleaned columns: {cleaned.data.columns}")

# Chain operations
result = (sample
    .wrangling.clean_names()
    .wrangling.recode("region", {"North": ["North", "East"]})
)
Original columns: ['svy_row_index', 'id', 'region', 'age', 'income', 'weight']
Cleaned columns: ['svy_row_index', 'id', 'region', 'age', 'income', 'weight']

This design prevents accidental data corruption and makes workflows easier to debug.

Key Takeaways

Sample wraps data + design - Single object for all operations

Inspect easily - show_data(), show_records(), describe()

Immutable - Transformations return new objects

Gateway to functionality - Access .wrangling, .estimation, .glm

Defensive copies - .data and .design are safe to inspect

Next Steps

Now that you understand the Sample object, learn how to clean and transform your data: clean names, recode values, bin variables, create new columns

Master the basics?
Continue to Data Wrangling →