Quick Tour: The svy Sample Object in Python

5-minute introduction to data exploration, filtering, and summaries

Tutorials

Getting Started

Sample Object

Python

Learn the svy Sample object - your central interface for survey data exploration, filtering, summaries, and analysis. Master data inspection, weighted summaries, and immutable transformations.

Author

Mamadou S. Diallo, Ph.D.

Published

January 18, 2026

Modified

January 18, 2026

Keywords

svy Sample object, survey data exploration Python, svy tutorial quickstart, inspect survey data Python, survey data filtering, weighted summary statistics, Polars DataFrame survey, svy Design object, survey data wrangling, immutable data objects, show_data svy, describe survey data

5-minute introduction to Sample—the core object you’ll use throughout these tutorials.

What is Sample?

Sample wraps your survey data (a Polars DataFrame) with design information, providing a unified interface for data exploration, wrangling, weighting, and estimation.

Think of Sample as:

Your survey dataset + design metadata
A gateway to all svy functionality
Immutable by default (transformations return new Sample objects)

import polars as pl
import svy

# Create sample data
df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "region": ["North", "South", "North", "East", "South"],
    "age": [22, 47, 35, 61, 29],
    "income": [45000, 62000, 51000, 78000, 43000],
    "weight": [1.0, 1.2, 0.9, 1.1, 0.8],
})

# Define survey design
design = svy.Design(wgt="weight", stratum="region")

# Create Sample object
sample = svy.Sample(df, design=design)

print(sample)

╭─────────────────────────── Sample ────────────────────────────╮
│ Survey Data:                                                  │
│   Number of rows: 5                                           │
│   Number of columns: 7                                        │
│   Number of strata: 3                                         │
│   Number of PSUs: None                                        │
│                                                               │
│ Survey Design:                                                │
│                                                               │
│    Field               Value                                  │
│    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          │
│    Row index           svy_row_index                          │
│    Stratum             region                                 │
│    PSU                 None                                   │
│    SSU                 None                                   │
│    Weight              weight                                 │
│    With replacement    False                                  │
│    Prob                None                                   │
│    Hit                 None                                   │
│    MOS                 None                                   │
│    Population size     None                                   │
│    Replicate weights   None                                   │
│                                                               │
╰───────────────────────────────────────────────────────────────╯

Quick Data Inspection

Preview Data

# First 3 rows
sample.show_data(how="head", n=3)

# Specific columns only
sample.show_data(columns=["id", "region", "age"], how="head", n=3)

# Last 2 rows, sorted by age
sample.show_data(how="tail", n=2, sort_by="age", descending=True)

# Random sample (reproducible with seed)
sample.show_data(how="sample", n=3, rstate=42)

shape: (3, 6)

svy_row_index	id	region	age	income	weight
u32	i64	str	i64	i64	f64
2	3	"North"	35	51000	0.9
1	2	"South"	47	62000	1.2
4	5	"South"	29	43000	0.8

Filter Records

# Filter by values (dictionary syntax)
sample.show_records(
    where={"region": ["North", "East"]},
    columns=["id", "region", "age"]
)

# Filter with expressions
from svy.core.expr import col

sample.show_records(
    where=[col("age") > 30, col("region") == "South"],
    order_by="income",
    descending=True
)

shape: (1, 6)

svy_row_index	id	region	age	income	weight
u32	i64	str	i64	i64	f64
1	2	"South"	47	62000	1.2

Sample Properties

Access key information about your sample:

print(f"Number of records: {sample.n_records}\n")
print(f"Number of columns: {sample.n_columns}\n")
print(f"Number of strata: {sample.n_strata}\n")
print(f"Number of psus: {sample.n_psus}\n")

print(f"Strata: {sample.strata}")

# Access underlying data (defensive copy)
df_copy = sample.data
print(df_copy.head())

# Access design
design_copy = sample.design
print(design_copy)

Number of records: 5

Number of columns: 7

Number of strata: 3

Number of psus: 0

Strata: shape: (3, 1)
┌────────┐
│ region │
│ ---    │
│ str    │
╞════════╡
│ East   │
│ North  │
│ South  │
└────────┘
shape: (5, 6)
┌───────────────┬─────┬────────┬─────┬────────┬────────┐
│ svy_row_index ┆ id  ┆ region ┆ age ┆ income ┆ weight │
│ ---           ┆ --- ┆ ---    ┆ --- ┆ ---    ┆ ---    │
│ u32           ┆ i64 ┆ str    ┆ i64 ┆ i64    ┆ f64    │
╞═══════════════╪═════╪════════╪═════╪════════╪════════╡
│ 0             ┆ 1   ┆ North  ┆ 22  ┆ 45000  ┆ 1.0    │
│ 1             ┆ 2   ┆ South  ┆ 47  ┆ 62000  ┆ 1.2    │
│ 2             ┆ 3   ┆ North  ┆ 35  ┆ 51000  ┆ 0.9    │
│ 3             ┆ 4   ┆ East   ┆ 61  ┆ 78000  ┆ 1.1    │
│ 4             ┆ 5   ┆ South  ┆ 29  ┆ 43000  ┆ 0.8    │
└───────────────┴─────┴────────┴─────┴────────┴────────┘

╭───────────── Design ──────────────╮
│ Field               Value         │
│ ───────────────────────────────── │
│ Row index           svy_row_index │
│ Stratum             region        │
│ PSU                 None          │
│ SSU                 None          │
│ Weight              weight        │
│ With replacement    False         │
│ Prob                None          │
│ Hit                 None          │
│ MOS                 None          │
│ Population size     None          │
│ Replicate weights   None          │
╰───────────────────────────────────╯

Note: sample.data and sample.design return defensive copies—safe to inspect without modifying the original Sample.

Data Summaries

Unweighted Summary

# Quick statistical summary
summary = sample.describe()
print(summary)

╭────────────────────────────────────── Describe ──────────────────────────────────────╮
│ Columns: 5                                                                           │
│ Weighted: False                                                                      │
│ drop_nulls: True                                                                     │
│ percentiles: (0.05, 0.25, 0.5, 0.75, 0.95)                                           │
│ generated_at: 2026-02-09T15:09:16+00:00                                              │
│                                                                                      │
│ Numeric                                                                              │
│                                                                                      │
│   name    type           mis   mea       std   min    p25   p50    p75   max     sum │
│   ────────────────────────────────────────────────────────────────────────────────── │
│   id      Discrete         0     3   1.58114     1      2     3      4     5      15 │
│   age     Discrete         0   38.   15.4337    22     29    35     47    61     194 │
│   inco…   Discrete         0   558   14446.5   430   4500   510   6200   780   27900 │
│   weig…   Continuo…        0     1   0.15811   0.8    0.9     1    1.1   1.2       5 │
│                                                                                      │
│ Categorical                                                                          │
│                                                                                      │
│   name     type      n   miss   levels   mode    top                                 │
│   ────────────────────────────────────────────────────────────────                   │
│   region   nominal   5      0        3   South   South: 2 (40.00%)                   │
│                                                  North: 2 (40.00%)                   │
│                                                  East: 1 (20.00%)                    │
╰──────────────────────────────────────────────────────────────────────────────────────╯

Output includes: - Count, missing values - Mean, standard deviation - Min, quartiles, max - For categorical: top categories and frequencies

Weighted Summary

# Uses design weights if available
weighted_summary = sample.describe(weighted=True)
print(weighted_summary)

╭────────────────────────────────────── Describe ──────────────────────────────────────╮
│ Columns: 5                                                                           │
│ Weighted: True (weight_col=weight)                                                   │
│ drop_nulls: True                                                                     │
│ percentiles: (0.05, 0.25, 0.5, 0.75, 0.95)                                           │
│ generated_at: 2026-02-09T15:09:16+00:00                                              │
│                                                                                      │
│ Numeric                                                                              │
│                                                                                      │
│   name    type           mis   mea       std   min    p25   p50    p75   max     sum │
│   ────────────────────────────────────────────────────────────────────────────────── │
│   id      Discrete         0   2.9   1.58114     1      2     3      4     5    14.5 │
│   age     Discrete         0   40.   15.4337    22     29    35     47    61   200.2 │
│   inco…   Discrete         0   571   14446.5   430   4500   510   6200   780   28550 │
│   weig…   Continuo…        0   1.0   0.15811   0.8    0.9     1    1.1   1.2     5.1 │
│                                                                                      │
│ Categorical                                                                          │
│                                                                                      │
│   name     type      n   miss   levels   mode    top                                 │
│   ──────────────────────────────────────────────────────────────────                 │
│   region   nominal   5      0        3   South   South: 2 (40.00%)                   │
│                                                  North: 1.9 (38.00%)                 │
│                                                  East: 1.1 (22.00%)                  │
╰──────────────────────────────────────────────────────────────────────────────────────╯

Weighted summaries account for sampling design, producing population-representative statistics.

Sample is Immutable

Transformations return new Sample objects:

# Original sample unchanged
original = sample

# Wrangling creates new sample
cleaned = sample.wrangling.clean_names()

# Original still exists
print(f"Original columns: {original.data.columns}")
print(f"Cleaned columns: {cleaned.data.columns}")

# Chain operations
result = (sample
    .wrangling.clean_names()
    .wrangling.recode("region", {"North": ["North", "East"]})
)

Original columns: ['svy_row_index', 'id', 'region', 'age', 'income', 'weight']
Cleaned columns: ['svy_row_index', 'id', 'region', 'age', 'income', 'weight']

This design prevents accidental data corruption and makes workflows easier to debug.

Key Takeaways

✅ Sample wraps data + design - Single object for all operations

✅ Inspect easily - show_data(), show_records(), describe()

✅ Immutable - Transformations return new objects

✅ Gateway to functionality - Access .wrangling, .estimation, .glm

✅ Defensive copies - .data and .design are safe to inspect

Next Steps

Now that you understand the Sample object, learn how to clean and transform your data: clean names, recode values, bin variables, create new columns

Master the basics?
Continue to Data Wrangling →