Sample — quick tour

Peek, slice, and summarize your survey data

What is Sample?

Sample is a Python class that encapsulates the core functionality for working with a survey sample.

It wraps a Polars DataFrame and, optionally, a survey Design.

In the tutorials, we’ll use Sample to explore common tasks: previewing data, filtering rows, and producing quick summaries—before moving on to selection, weighting, and estimation.

This first tutorial focuses on those essentials so you can inspect your data and get oriented fast.

import polars as pl
from svy import Sample, Design

# Get the dataset
df = pl.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "region": ["North", "South", "North", "East", "South"],
        "age": [22, 47, 35, 61, 29],
        "w": [1.0, 1.2, 0.9, 1.1, 0.8],
    }
)

# Define the sampling design
dsg = Design(wgt="w", stratum="region")

s = Sample(df, design=dsg)

print(s) 
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 5                                           
   Number of columns: 6                                        
   Number of strata: 3                                         
   Number of PSUs: None                                        
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          
    Row index           svy_row_index                          
    Stratum             region                                 
    PSU                 None                                   
    SSU                 None                                   
    Weight              w                                      
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   None                                   
                                                               
╰───────────────────────────────────────────────────────────────╯

Peek at data

show_data()

Grab a slice of the data—head, tail, or a random sample. You can pick columns and sort.

# first 3 rows with two columns
s.show_data(columns=["id","region"], how="head", n=3)

# random 3 rows (reproducible with a seed)
s.show_data(how="sample", n=3, rstate=123)

# tail 2 rows sorted by age (descending)
s.show_data(how="tail", n=2, sort_by="age", descending=True)
shape: (2, 5)
svy_row_index id region age w
u32 i64 str i64 f64
4 5 "South" 29 0.8
0 1 "North" 22 1.0

show_records()

Filter rows by simple dicts or expressions; order and page results.

import svy  # for svy.col/when etc.

# dict filter: equality or membership
s.show_records(where={"region": ["North","East"]}, columns=["id","region","age"])

# expression filter: age > 30 and region == "South"
s.show_records(
    where=[svy.col("age") > 30, svy.col("region") == "South"],
    columns=["id","region","age"],
    order_by="age",
    descending=True,
    n=5
)
shape: (1, 3)
id region age
i64 str i64
2 "South" 47

Quick structure & summaries

Basic properties

# n_records
print(f"Number of records: {s.n_records}")

# n_columns
print(f"Number of columns: {s.n_columns}")

# strata
print(f"list of strata: {s.strata}")

# psus
print(f"list of strata: {s.psus}")
Number of records: 5
Number of columns: 6
list of strata: shape: (3, 1)
┌────────┐
│ region │
│ ---    │
│ str    │
╞════════╡
│ East   │
│ North  │
│ South  │
└────────┘
list of strata: shape: (0, 0)
┌┐
╞╡
└┘

describe()

Unweighted and weighted summaries using the current schema; weighted=True will try to use the active design weight.

svy.DescribeResult.PRINT_WIDTH = 99

desc_unw = s.describe(top_k=5)

print(desc_unw)
╭─────────────────────────────────────────── Describe ────────────────────────────────────────────╮
 Columns: 4                                                                                      
 Weighted: False                                                                                 
 drop_nulls: True                                                                                
 percentiles: (0.05, 0.25, 0.5, 0.75, 0.95)                                                      
 generated_at: 2025-11-01T18:31:56+00:00                                                         
                                                                                                 
 Numeric                                                                                         
                                                                                                 
   name   type         n   miss   mean        std   min   p25   p50   p75   max   sum            
   ──────────────────────────────────────────────────────────────────────────────────            
   id     Discrete     5      0      3    1.58114     1     2     3     4     5    15            
   age    Discrete     5      0   38.8    15.4337    22    29    35    47    61   194            
   w      Continuous   5      0      1   0.158114   0.8   0.9     1   1.1   1.2     5            
                                                                                                 
 String                                                                                          
                                                                                                 
   name     n   miss   unique   shortest   longest                                               
   ───────────────────────────────────────────────                                               
   region   5      0        3          4         5                                               
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
desc_w = s.describe(weighted=True)  # uses design.wgt if available

print(desc_w)
╭─────────────────────────────────────────── Describe ────────────────────────────────────────────╮
 Columns: 4                                                                                      
 Weighted: True (weight_col=w)                                                                   
 drop_nulls: True                                                                                
 percentiles: (0.05, 0.25, 0.5, 0.75, 0.95)                                                      
 generated_at: 2025-11-01T18:31:56+00:00                                                         
                                                                                                 
 Numeric                                                                                         
                                                                                                 
   name   type         n   miss    mean        std   min   p25   p50   p75   max     sum         
   ─────────────────────────────────────────────────────────────────────────────────────         
   id     Discrete     5      0     2.9    1.58114     1     2     3     4     5    14.5         
   age    Discrete     5      0   40.04    15.4337    22    29    35    47    61   200.2         
   w      Continuous   5      0    1.02   0.158114   0.8   0.9     1   1.1   1.2     5.1         
                                                                                                 
 String                                                                                          
                                                                                                 
   name     n   miss   unique   shortest   longest                                               
   ───────────────────────────────────────────────                                               
   region   5      0        3          4         5                                               
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯

Sample.data and Sample.design are defensive copies—safe to inspect without mutating the internals.

Next Steps

Ready to shape your data? Continue to Wrangling — basics, where you’ll clean names, recode values, create bins, clamp extremes, and finish with new variables via mutate().