Sample is a Python class that encapsulates the core functionality for working with a survey sample.
It wraps a Polars DataFrame and, optionally, a survey Design.
In the tutorials, we’ll use Sample to explore common tasks: previewing data, filtering rows, and producing quick summaries—before moving on to selection, weighting, and estimation.
This first tutorial focuses on those essentials so you can inspect your data and get oriented fast.
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 5 ││ Number of columns: 6 ││ Number of strata: 3 ││ Number of PSUs: None ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum region ││ PSU None ││ SSU None ││ Weight w ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights None │││╰───────────────────────────────────────────────────────────────╯
Peek at data
show_data()
Grab a slice of the data—head, tail, or a random sample. You can pick columns and sort.
# first 3 rows with two columnss.show_data(columns=["id","region"], how="head", n=3)# random 3 rows (reproducible with a seed)s.show_data(how="sample", n=3, rstate=123)# tail 2 rows sorted by age (descending)s.show_data(how="tail", n=2, sort_by="age", descending=True)
shape: (2, 5)
svy_row_index
id
region
age
w
u32
i64
str
i64
f64
4
5
"South"
29
0.8
0
1
"North"
22
1.0
show_records()
Filter rows by simple dicts or expressions; order and page results.
import svy # for svy.col/when etc.# dict filter: equality or memberships.show_records(where={"region": ["North","East"]}, columns=["id","region","age"])# expression filter: age > 30 and region == "South"s.show_records( where=[svy.col("age") >30, svy.col("region") =="South"], columns=["id","region","age"], order_by="age", descending=True, n=5)
shape: (1, 3)
id
region
age
i64
str
i64
2
"South"
47
Quick structure & summaries
Basic properties
# n_recordsprint(f"Number of records: {s.n_records}")# n_columnsprint(f"Number of columns: {s.n_columns}")# strataprint(f"list of strata: {s.strata}")# psusprint(f"list of strata: {s.psus}")
Number of records: 5
Number of columns: 6
list of strata: shape: (3, 1)
┌────────┐
│ region │
│ --- │
│ str │
╞════════╡
│ East │
│ North │
│ South │
└────────┘
list of strata: shape: (0, 0)
┌┐
╞╡
└┘
describe()
Unweighted and weighted summaries using the current schema; weighted=True will try to use the active design weight.
╭─────────────────────────────────────────── Describe ────────────────────────────────────────────╮│ Columns: 4 ││ Weighted: False ││ drop_nulls: True ││ percentiles: (0.05, 0.25, 0.5, 0.75, 0.95) ││ generated_at: 2025-11-01T18:31:56+00:00 ││││Numeric││││name type n miss mean std min p25 p50 p75 max sum││ ────────────────────────────────────────────────────────────────────────────────── ││ id Discrete 5 0 3 1.58114 1 2 3 4 5 15 ││ age Discrete 5 0 38.8 15.4337 22 29 35 47 61 194 ││ w Continuous 5 0 1 0.158114 0.8 0.9 1 1.1 1.2 5 ││││String││││name n miss unique shortest longest││ ─────────────────────────────────────────────── ││ region 5 0 3 4 5 │╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
desc_w = s.describe(weighted=True) # uses design.wgt if availableprint(desc_w)
╭─────────────────────────────────────────── Describe ────────────────────────────────────────────╮│ Columns: 4 ││ Weighted: True (weight_col=w) ││ drop_nulls: True ││ percentiles: (0.05, 0.25, 0.5, 0.75, 0.95) ││ generated_at: 2025-11-01T18:31:56+00:00 ││││Numeric││││name type n miss mean std min p25 p50 p75 max sum││ ───────────────────────────────────────────────────────────────────────────────────── ││ id Discrete 5 0 2.9 1.58114 1 2 3 4 5 14.5 ││ age Discrete 5 0 40.04 15.4337 22 29 35 47 61 200.2 ││ w Continuous 5 0 1.02 0.158114 0.8 0.9 1 1.1 1.2 5.1 ││││String││││name n miss unique shortest longest││ ─────────────────────────────────────────────── ││ region 5 0 3 4 5 │╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
Sample.data and Sample.design are defensive copies—safe to inspect without mutating the internals.
Next Steps
Ready to shape your data? Continue to Wrangling — basics, where you’ll clean names, recode values, create bins, clamp extremes, and finish with new variables via mutate().