Replicate Weights for Variance Estimation in Python

BRR, Jackknife, Bootstrap, and SDR methods for complex survey designs

Tutorials
Survey Weighting
Variance Estimation
Python
Learn how to create and adjust replicate weights for survey variance estimation using the svy library. Covers BRR, Jackknife (JKn and JK2), Bootstrap, and SDR methods with examples of propagating weight adjustments to replicates.
Author

Mamadou S. Diallo, Ph.D.

Published

January 18, 2026

Modified

January 26, 2026

Keywords

replicate weights variance, bootstrap weights survey, jackknife weights Python, BRR balanced repeated replication, JK2 paired jackknife, SDR successive difference replication, Fay BRR damped replication, survey variance estimation, replicate weight adjustment, svy replication tutorial

Replicate weights provide a flexible approach to variance estimation for complex survey designs. Rather than relying on analytical formulas (Taylor linearization), replication methods estimate variance by repeatedly perturbing the sample weights and observing the resulting variation in estimates.

This tutorial covers:

  1. When to use replicate weights — comparing replication vs. Taylor linearization
  2. Creating replicate weights — BRR, Jackknife (JKn and JK2), Bootstrap, and SDR methods
  3. Preparing designs for replication — using create_variance_strata() to create paired PSU structures
  4. Adjusting replicate weights — propagating nonresponse, calibration, and other adjustments

This tutorial assumes you’ve already created base weights and applied any necessary adjustments. See the Weight Adjustment tutorial for details on nonresponse adjustment, poststratification, calibration, and raking.

When to Use Replicate Weights

Replicate weights are especially useful when:

  • Estimating non-linear statistics (medians, percentiles, ratios) where Taylor linearization may be inaccurate
  • The number of PSUs per stratum is small, making linearization-based variance estimates unstable
  • Software limitations prevent proper specification of the complex design
  • Sharing data with secondary analysts who may not have access to design details
Approach Strengths Limitations
Taylor linearization Computationally efficient; no additional columns needed Requires correct design specification; may be inaccurate for non-linear statistics
Replication Flexible; works well for non-linear statistics; easy to share Increases file size; computationally heavier for many replicates

Setting Up the Sample Data

We’ll create a sample dataset that satisfies the requirements for all replication methods. BRR requires exactly 2 PSUs per stratum, which is the most restrictive requirement.

import numpy as np
import polars as pl
import svy

rng = np.random.default_rng(42)

# Create sample with 4 strata, 2 PSUs each, 5 units per PSU
rows = []
y_means = {
    "S1_P1": 10, "S1_P2": 12,
    "S2_P1": 8,  "S2_P2": 9,
    "S3_P1": 15, "S3_P2": 13,
    "S4_P1": 11, "S4_P2": 10,
}

for s in range(1, 5):
    for p in range(1, 3):
        label = f"S{s}_P{p}"
        for i in range(5):
            rows.append({
                "unit_id": f"S{s}P{p}U{i + 1}",
                "stratum": f"S{s}",
                "psu": f"S{s}_P{p}",
                "base_wgt": rng.uniform(1.0, 3.0),
                "y": rng.normal(y_means[label], 2.0),
                "x_cat": rng.choice(["A", "B", "C"]),
                "resp_status": rng.choice(
                    ["respondent", "non-respondent"],
                    p=[0.85, 0.15]
                ),
            })

df = pl.DataFrame(rows)
print(f"Sample size: {df.shape[0]} units")
print(df.head(10))
Sample size: 40 units
shape: (10, 7)
┌─────────┬─────────┬───────┬──────────┬───────────┬───────┬────────────────┐
│ unit_id ┆ stratum ┆ psu   ┆ base_wgt ┆ y         ┆ x_cat ┆ resp_status    │
│ ---     ┆ ---     ┆ ---   ┆ ---      ┆ ---       ┆ ---   ┆ ---            │
│ str     ┆ str     ┆ str   ┆ f64      ┆ f64       ┆ str   ┆ str            │
╞═════════╪═════════╪═══════╪══════════╪═══════════╪═══════╪════════════════╡
│ S1P1U1  ┆ S1      ┆ S1_P1 ┆ 2.547912 ┆ 7.920032  ┆ B     ┆ respondent     │
│ S1P1U2  ┆ S1      ┆ S1_P1 ┆ 1.188355 ┆ 7.395641  ┆ C     ┆ respondent     │
│ S1P1U3  ┆ S1      ┆ S1_P1 ┆ 2.572129 ┆ 9.966398  ┆ C     ┆ respondent     │
│ S1P1U4  ┆ S1      ┆ S1_P1 ┆ 2.85353  ┆ 10.132061 ┆ B     ┆ respondent     │
│ S1P1U5  ┆ S1      ┆ S1_P1 ┆ 1.886828 ┆ 8.281415  ┆ A     ┆ respondent     │
│ S1P2U1  ┆ S1      ┆ S1_P2 ┆ 2.655262 ┆ 11.900148 ┆ B     ┆ respondent     │
│ S1P2U2  ┆ S1      ┆ S1_P2 ┆ 1.709052 ┆ 14.445083 ┆ B     ┆ respondent     │
│ S1P2U3  ┆ S1      ┆ S1_P2 ┆ 1.389277 ┆ 13.064618 ┆ C     ┆ respondent     │
│ S1P2U4  ┆ S1      ┆ S1_P2 ┆ 1.308579 ┆ 12.861642 ┆ C     ┆ non-respondent │
│ S1P2U5  ┆ S1      ┆ S1_P2 ┆ 1.651651 ┆ 10.372455 ┆ C     ┆ respondent     │
└─────────┴─────────┴───────┴──────────┴───────────┴───────┴────────────────┘
# Define the survey design
sample = svy.Sample(
    data=df,
    design=svy.Design(
        stratum="stratum",
        psu="psu",
        wgt="base_wgt"
    ),
)

print(sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 10                                       
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   None                                   
                                                               
╰───────────────────────────────────────────────────────────────╯

Replication Methods

The svy library supports four replication methods:

Method Function Requirements Typical Use Case
Balanced Repeated Replication (BRR) create_brr_wgts() Exactly 2 PSUs per stratum Designs with paired PSUs
Jackknife (JKn) create_jk_wgts(paired=False) ≥1 PSU per stratum General purpose; moderate # of PSUs
Jackknife (JK2) create_jk_wgts(paired=True) 2–3 PSUs per stratum Paired designs; fewer replicates than JKn
Bootstrap create_bs_wgts() ≥2 PSUs per stratum Complex designs; non-linear statistics
Successive Difference Replication (SDR) create_sdr_wgts() Ordered/systematic samples Systematic samples; time series

Balanced Repeated Replication (BRR)

BRR constructs balanced half-samples using a Hadamard matrix design. In each replicate, one PSU is selected from each stratum, and the weights of selected PSUs are doubled while the other PSU receives zero weight.

Requirements:

  • Exactly 2 PSUs per stratum (after any stratum collapsing)
  • Number of replicates defaults to the smallest Hadamard matrix size ≥ number of strata
brr_sample = sample.weighting.create_brr_wgts(
    rep_prefix="brr_wgt",
)

print(brr_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 14                                       
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=BRR,                 
                        prefix='brr_wgt', n_reps=4, df=4.0,    
                        fay=0.0)                               
                                                               
╰───────────────────────────────────────────────────────────────╯

Examine the replicate weight pattern:

print(
    brr_sample.show_data(
        columns=["stratum", "psu", "base_wgt", "brr_wgt1", "brr_wgt2", "brr_wgt3", "brr_wgt4"],
        n=20,
    )
)
shape: (20, 7)
┌─────────┬───────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ stratum ┆ psu   ┆ base_wgt ┆ brr_wgt1 ┆ brr_wgt2 ┆ brr_wgt3 ┆ brr_wgt4 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ f64      ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════════╪═══════╪══════════╪══════════╪══════════╪══════════╪══════════╡
│ S1      ┆ S1_P1 ┆ 2.547912 ┆ 5.095824 ┆ 5.095824 ┆ 5.095824 ┆ 5.095824 │
│ S1      ┆ S1_P1 ┆ 1.188355 ┆ 2.376709 ┆ 2.376709 ┆ 2.376709 ┆ 2.376709 │
│ S1      ┆ S1_P1 ┆ 2.572129 ┆ 5.144257 ┆ 5.144257 ┆ 5.144257 ┆ 5.144257 │
│ S1      ┆ S1_P1 ┆ 2.85353  ┆ 5.70706  ┆ 5.70706  ┆ 5.70706  ┆ 5.70706  │
│ S1      ┆ S1_P1 ┆ 1.886828 ┆ 3.773657 ┆ 3.773657 ┆ 3.773657 ┆ 3.773657 │
│ …       ┆ …     ┆ …        ┆ …        ┆ …        ┆ …        ┆ …        │
│ S2      ┆ S2_P2 ┆ 2.329702 ┆ 0.0      ┆ 4.659403 ┆ 0.0      ┆ 4.659403 │
│ S2      ┆ S2_P2 ┆ 1.917832 ┆ 0.0      ┆ 3.835663 ┆ 0.0      ┆ 3.835663 │
│ S2      ┆ S2_P2 ┆ 2.336806 ┆ 0.0      ┆ 4.673612 ┆ 0.0      ┆ 4.673612 │
│ S2      ┆ S2_P2 ┆ 2.529998 ┆ 0.0      ┆ 5.059995 ┆ 0.0      ┆ 5.059995 │
│ S2      ┆ S2_P2 ┆ 1.6079   ┆ 0.0      ┆ 3.2158   ┆ 0.0      ┆ 3.2158   │
└─────────┴───────┴──────────┴──────────┴──────────┴──────────┴──────────┘

Notice how within each stratum, one PSU has doubled weights (2×) and the other has zero weights in each replicate.

BRR Parameters

Parameter Type Default Description
n_reps int | None Auto Number of replicates (rounded to valid Hadamard size)
rep_prefix str | None "svy_brr_wgt" Prefix for replicate weight column names
fay_coef float 0.0 Fay coefficient ρ ∈ [0, 1) for damped BRR
rstate int | None None Random seed for PSU ordering within strata
drop_nulls bool False Drop rows with missing values in design columns

Reproducible BRR with rstate

The rstate parameter controls how PSUs are ordered within each stratum before applying the Hadamard matrix. This affects which PSU gets the +1 (doubled) vs -1 (zeroed) treatment in each replicate.

# Create reproducible BRR weights
brr_sample1 = sample.weighting.create_brr_wgts(rep_prefix="brr1", rstate=42)
brr_sample2 = sample.weighting.create_brr_wgts(rep_prefix="brr2", rstate=42)

# Same seed produces identical results
arr1 = brr_sample1.data["brr11"].to_numpy()
arr2 = brr_sample2.data["brr21"].to_numpy()
print(f"Same seed produces identical weights: {np.allclose(arr1, arr2)}")
Same seed produces identical weights: False

Fay-BRR (Generalized BRR)

Standard BRR can produce unstable estimates when PSU contributions vary greatly. Fay-BRR dampens the perturbation by using a coefficient ρ ∈ (0, 1):

  • Selected PSU weight: multiplied by (2 − ρ)
  • Non-selected PSU weight: multiplied by ρ

Common choices are ρ = 0.3 to 0.5. As ρ → 0, Fay-BRR approaches standard BRR; as ρ → 1, perturbation vanishes.

fay_sample = sample.weighting.create_brr_wgts(
    n_reps=8,
    rep_prefix="fay_wgt",
    fay_coef=0.5,
)

print(
    fay_sample.show_data(
        columns=["stratum", "psu", "base_wgt", "fay_wgt1", "fay_wgt2"],
        n=10,
    )
)
shape: (10, 5)
┌─────────┬───────┬──────────┬──────────┬──────────┐
│ stratum ┆ psu   ┆ base_wgt ┆ fay_wgt1 ┆ fay_wgt2 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ f64      ┆ f64      ┆ f64      │
╞═════════╪═══════╪══════════╪══════════╪══════════╡
│ S1      ┆ S1_P1 ┆ 2.547912 ┆ 3.821868 ┆ 3.821868 │
│ S1      ┆ S1_P1 ┆ 1.188355 ┆ 1.782532 ┆ 1.782532 │
│ S1      ┆ S1_P1 ┆ 2.572129 ┆ 3.858193 ┆ 3.858193 │
│ S1      ┆ S1_P1 ┆ 2.85353  ┆ 4.280295 ┆ 4.280295 │
│ S1      ┆ S1_P1 ┆ 1.886828 ┆ 2.830243 ┆ 2.830243 │
│ S1      ┆ S1_P2 ┆ 2.655262 ┆ 1.327631 ┆ 1.327631 │
│ S1      ┆ S1_P2 ┆ 1.709052 ┆ 0.854526 ┆ 0.854526 │
│ S1      ┆ S1_P2 ┆ 1.389277 ┆ 0.694639 ┆ 0.694639 │
│ S1      ┆ S1_P2 ┆ 1.308579 ┆ 0.654289 ┆ 0.654289 │
│ S1      ┆ S1_P2 ┆ 1.651651 ┆ 0.825825 ┆ 0.825825 │
└─────────┴───────┴──────────┴──────────┴──────────┘

With fay_coef=0.5, selected PSUs get weight × 1.5 and non-selected get weight × 0.5 (instead of ×2 and ×0).

Jackknife Methods (JKn and JK2)

The svy library supports two jackknife variants controlled by the paired parameter:

Variant paired # Replicates Description
JKn (delete-one-PSU) False Total # of PSUs Delete one PSU at a time across entire sample
JK2 (paired/stratified) True # of strata Delete one PSU per stratum per replicate

JKn: Delete-One-PSU Jackknife

JKn creates replicates by systematically deleting one PSU at a time and re-weighting the remaining PSUs within each stratum. The number of replicates equals the total number of PSUs across all strata.

jkn_sample = sample.weighting.create_jk_wgts(
    paired=False,  # JKn (delete-one-PSU)
    rep_prefix="jkn_wgt",
)

print(jkn_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 38                                       
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=BRR,                 
                        prefix='brr_wgt', n_reps=4, df=4.0,    
                        fay=0.0)                               
                                                               
╰───────────────────────────────────────────────────────────────╯
# With 4 strata × 2 PSUs = 8 PSUs, we get 8 replicates
print(
    jkn_sample.show_data(
        columns=["stratum", "psu", "base_wgt", "jkn_wgt1", "jkn_wgt2", "jkn_wgt3", "jkn_wgt4"],
        n=12,
    )
)
shape: (12, 7)
┌─────────┬───────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ stratum ┆ psu   ┆ base_wgt ┆ jkn_wgt1 ┆ jkn_wgt2 ┆ jkn_wgt3 ┆ jkn_wgt4 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ f64      ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════════╪═══════╪══════════╪══════════╪══════════╪══════════╪══════════╡
│ S1      ┆ S1_P1 ┆ 2.547912 ┆ 0.0      ┆ 5.095824 ┆ 2.547912 ┆ 2.547912 │
│ S1      ┆ S1_P1 ┆ 1.188355 ┆ 0.0      ┆ 2.376709 ┆ 1.188355 ┆ 1.188355 │
│ S1      ┆ S1_P1 ┆ 2.572129 ┆ 0.0      ┆ 5.144257 ┆ 2.572129 ┆ 2.572129 │
│ S1      ┆ S1_P1 ┆ 2.85353  ┆ 0.0      ┆ 5.70706  ┆ 2.85353  ┆ 2.85353  │
│ S1      ┆ S1_P1 ┆ 1.886828 ┆ 0.0      ┆ 3.773657 ┆ 1.886828 ┆ 1.886828 │
│ …       ┆ …     ┆ …        ┆ …        ┆ …        ┆ …        ┆ …        │
│ S1      ┆ S1_P2 ┆ 1.389277 ┆ 2.778555 ┆ 0.0      ┆ 1.389277 ┆ 1.389277 │
│ S1      ┆ S1_P2 ┆ 1.308579 ┆ 2.617158 ┆ 0.0      ┆ 1.308579 ┆ 1.308579 │
│ S1      ┆ S1_P2 ┆ 1.651651 ┆ 3.303301 ┆ 0.0      ┆ 1.651651 ┆ 1.651651 │
│ S2      ┆ S2_P1 ┆ 1.378943 ┆ 1.378943 ┆ 1.378943 ┆ 0.0      ┆ 2.757885 │
│ S2      ┆ S2_P1 ┆ 2.339628 ┆ 2.339628 ┆ 2.339628 ┆ 0.0      ┆ 4.679256 │
└─────────┴───────┴──────────┴──────────┴──────────┴──────────┴──────────┘

JK2: Paired/Stratified Jackknife

JK2 is designed for paired PSU designs (2–3 PSUs per stratum). It creates one replicate per stratum, where in each replicate one PSU is deleted from that stratum and the remaining PSUs are upweighted.

Adjustment factor: When deleting one PSU from a stratum with n PSUs: - Deleted PSU: weight = 0 - Remaining PSUs: weight × (n / (n-1))

For pairs (n=2): factor = 2.0 For triplets (n=3): factor = 1.5

jk2_sample = sample.weighting.create_jk_wgts(
    paired=True,  # JK2 (paired/stratified)
    rep_prefix="jk2_wgt",
)

print(jk2_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 42                                       
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=BRR,                 
                        prefix='brr_wgt', n_reps=4, df=4.0,    
                        fay=0.0)                               
                                                               
╰───────────────────────────────────────────────────────────────╯
# With 4 strata, we get 4 replicates (one per stratum)
print(
    jk2_sample.show_data(
        columns=["stratum", "psu", "base_wgt", "jk2_wgt1", "jk2_wgt2", "jk2_wgt3", "jk2_wgt4"],
        n=20,
    )
)
shape: (20, 7)
┌─────────┬───────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ stratum ┆ psu   ┆ base_wgt ┆ jk2_wgt1 ┆ jk2_wgt2 ┆ jk2_wgt3 ┆ jk2_wgt4 │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ f64      ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════════╪═══════╪══════════╪══════════╪══════════╪══════════╪══════════╡
│ S1      ┆ S1_P1 ┆ 2.547912 ┆ 0.0      ┆ 2.547912 ┆ 2.547912 ┆ 2.547912 │
│ S1      ┆ S1_P1 ┆ 1.188355 ┆ 0.0      ┆ 1.188355 ┆ 1.188355 ┆ 1.188355 │
│ S1      ┆ S1_P1 ┆ 2.572129 ┆ 0.0      ┆ 2.572129 ┆ 2.572129 ┆ 2.572129 │
│ S1      ┆ S1_P1 ┆ 2.85353  ┆ 0.0      ┆ 2.85353  ┆ 2.85353  ┆ 2.85353  │
│ S1      ┆ S1_P1 ┆ 1.886828 ┆ 0.0      ┆ 1.886828 ┆ 1.886828 ┆ 1.886828 │
│ …       ┆ …     ┆ …        ┆ …        ┆ …        ┆ …        ┆ …        │
│ S2      ┆ S2_P2 ┆ 2.329702 ┆ 2.329702 ┆ 4.659403 ┆ 2.329702 ┆ 2.329702 │
│ S2      ┆ S2_P2 ┆ 1.917832 ┆ 1.917832 ┆ 3.835663 ┆ 1.917832 ┆ 1.917832 │
│ S2      ┆ S2_P2 ┆ 2.336806 ┆ 2.336806 ┆ 4.673612 ┆ 2.336806 ┆ 2.336806 │
│ S2      ┆ S2_P2 ┆ 2.529998 ┆ 2.529998 ┆ 5.059995 ┆ 2.529998 ┆ 2.529998 │
│ S2      ┆ S2_P2 ┆ 1.6079   ┆ 1.6079   ┆ 3.2158   ┆ 1.6079   ┆ 1.6079   │
└─────────┴───────┴──────────┴──────────┴──────────┴──────────┴──────────┘

Notice that: - Replicate 1 deletes a PSU from stratum S1, other strata unchanged - Replicate 2 deletes a PSU from stratum S2, other strata unchanged - And so on…

Reproducible JK2 with rstate

For JK2, the rstate parameter controls which PSU is selected for deletion in each stratum. This is particularly useful when strata have more than 2 PSUs.

# Create JK2 with different seeds
jk2_a = sample.weighting.create_jk_wgts(paired=True, rep_prefix="jk2a", rstate=0)
jk2_b = sample.weighting.create_jk_wgts(paired=True, rep_prefix="jk2b", rstate=0)

# Same seed produces identical results
arr_a = jk2_a.data["jk2a1"].to_numpy()
arr_b = jk2_b.data["jk2b1"].to_numpy()
print(f"Same seed produces identical weights: {np.allclose(arr_a, arr_b)}")
Same seed produces identical weights: True

Jackknife Parameters

Parameter Type Default Description
paired bool False If False, use JKn; if True, use JK2
rep_prefix str | None "svy_jkn_wgt" or "svy_jk2_wgt" Prefix for replicate weight column names
rstate int | None None Random seed for PSU selection (JK2 only)
drop_nulls bool False Drop rows with missing values in design columns

Bootstrap

Bootstrap replication draws PSUs with replacement within each stratum. The Rao-Wu rescaled bootstrap adjusts weights to maintain unbiasedness under the sampling design.

Requirements:

  • At least 2 PSUs per stratum
  • Stratum is optional
bs_sample = sample.weighting.create_bs_wgts(
    n_reps=200,
    rep_prefix="bs_wgt",
    rstate=rng,  # For reproducibility
)

print(bs_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 250                                      
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=Bootstrap,           
                        prefix='bs_wgt', n_reps=200,           
                        df=199.0)                              
                                                               
╰───────────────────────────────────────────────────────────────╯
# Bootstrap weights vary randomly across replicates
print(
    bs_sample.show_data(
        columns=["stratum", "psu", "base_wgt", "bs_wgt1", "bs_wgt2", "bs_wgt3"],
        n=10,
    )
)
shape: (10, 6)
┌─────────┬───────┬──────────┬──────────┬──────────┬──────────┐
│ stratum ┆ psu   ┆ base_wgt ┆ bs_wgt1  ┆ bs_wgt2  ┆ bs_wgt3  │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════════╪═══════╪══════════╪══════════╪══════════╪══════════╡
│ S1      ┆ S1_P1 ┆ 2.547912 ┆ 0.0      ┆ 5.095824 ┆ 2.547912 │
│ S1      ┆ S1_P1 ┆ 1.188355 ┆ 0.0      ┆ 2.376709 ┆ 1.188355 │
│ S1      ┆ S1_P1 ┆ 2.572129 ┆ 0.0      ┆ 5.144257 ┆ 2.572129 │
│ S1      ┆ S1_P1 ┆ 2.85353  ┆ 0.0      ┆ 5.70706  ┆ 2.85353  │
│ S1      ┆ S1_P1 ┆ 1.886828 ┆ 0.0      ┆ 3.773657 ┆ 1.886828 │
│ S1      ┆ S1_P2 ┆ 2.655262 ┆ 5.310525 ┆ 0.0      ┆ 2.655262 │
│ S1      ┆ S1_P2 ┆ 1.709052 ┆ 3.418104 ┆ 0.0      ┆ 1.709052 │
│ S1      ┆ S1_P2 ┆ 1.389277 ┆ 2.778555 ┆ 0.0      ┆ 1.389277 │
│ S1      ┆ S1_P2 ┆ 1.308579 ┆ 2.617158 ┆ 0.0      ┆ 1.308579 │
│ S1      ┆ S1_P2 ┆ 1.651651 ┆ 3.303301 ┆ 0.0      ┆ 1.651651 │
└─────────┴───────┴──────────┴──────────┴──────────┴──────────┘

Bootstrap Parameters

Parameter Type Default Description
n_reps int 500 Number of bootstrap replicates
rep_prefix str | None "svy_boot_wgt" Prefix for replicate weight column names
drop_nulls bool False Drop rows with missing values in design columns
rstate RandomState None Random state for reproducibility
TipChoosing the Number of Bootstrap Replicates

For simple statistics (means, totals), 200–500 replicates usually suffice. For percentiles or other non-linear statistics, consider 1,000+ replicates. More replicates reduce Monte Carlo error but increase computation time and file size.

Successive Difference Replication (SDR)

SDR is designed for systematic samples where units are ordered (e.g., by geography or time). It creates replicates based on successive differences between adjacent units, which better captures the correlation structure in ordered samples.

Requirements:

  • Units should be meaningfully ordered within strata
  • Works best with systematic or implicitly stratified samples
# Add an order column to simulate systematic sampling
df_ordered = df.with_columns(
    pl.arange(0, df.height).alias("sort_order")
)

ordered_sample = svy.Sample(
    data=df_ordered,
    design=svy.Design(stratum="stratum", psu="psu", wgt="base_wgt"),
)

sdr_sample = ordered_sample.weighting.create_sdr_wgts(
    n_reps=4,
    rep_prefix="sdr_wgt",
    order_col="sort_order",
)

print(sdr_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 15                                       
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=SDR,                 
                        prefix='sdr_wgt', n_reps=4, df=4.0)    
                                                               
╰───────────────────────────────────────────────────────────────╯
print(
    sdr_sample.show_data(
        columns=["stratum", "sort_order", "base_wgt", "sdr_wgt1", "sdr_wgt2", "sdr_wgt3", "sdr_wgt4"],
        n=12,
    )
)
shape: (12, 7)
┌─────────┬────────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ stratum ┆ sort_order ┆ base_wgt ┆ sdr_wgt1 ┆ sdr_wgt2 ┆ sdr_wgt3 ┆ sdr_wgt4 │
│ ---     ┆ ---        ┆ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ i64        ┆ f64      ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════════╪════════════╪══════════╪══════════╪══════════╪══════════╪══════════╡
│ S1      ┆ 0          ┆ 2.547912 ┆ 4.349558 ┆ 4.349558 ┆ 4.349558 ┆ 4.349558 │
│ S1      ┆ 1          ┆ 1.188355 ┆ 0.594177 ┆ 0.101945 ┆ 0.594177 ┆ 0.101945 │
│ S1      ┆ 2          ┆ 2.572129 ┆ 1.286064 ┆ 7.495732 ┆ 0.220654 ┆ 1.286064 │
│ S1      ┆ 3          ┆ 2.85353  ┆ 1.426765 ┆ 0.244794 ┆ 1.426765 ┆ 8.315796 │
│ S1      ┆ 4          ┆ 1.886828 ┆ 0.943414 ┆ 5.498621 ┆ 5.498621 ┆ 0.943414 │
│ …       ┆ …          ┆ …        ┆ …        ┆ …        ┆ …        ┆ …        │
│ S1      ┆ 7          ┆ 1.389277 ┆ 0.694639 ┆ 0.119181 ┆ 0.694639 ┆ 4.048651 │
│ S1      ┆ 8          ┆ 1.308579 ┆ 0.654289 ┆ 3.813479 ┆ 3.813479 ┆ 0.654289 │
│ S1      ┆ 9          ┆ 1.651651 ┆ 0.483757 ┆ 0.483757 ┆ 0.483757 ┆ 0.483757 │
│ S2      ┆ 10         ┆ 1.378943 ┆ 2.354002 ┆ 2.354002 ┆ 2.354002 ┆ 2.354002 │
│ S2      ┆ 11         ┆ 2.339628 ┆ 1.169814 ┆ 0.200708 ┆ 1.169814 ┆ 0.200708 │
└─────────┴────────────┴──────────┴──────────┴──────────┴──────────┴──────────┘

SDR Parameters

Parameter Type Default Description
n_reps int 4 Number of replicates (typically 4+)
rep_prefix str | None "svy_sdr_wgt" Prefix for replicate weight column names
order_col str | None None Column specifying sort order within strata
drop_nulls bool False Drop rows with missing values in design columns

Preparing Designs for BRR and JK2

Many survey designs have more than 2 PSUs per stratum, which makes them incompatible with BRR (requires exactly 2) or produces many replicates with JK2. The create_variance_strata() method helps by creating new variance strata with paired PSUs.

The Problem

# Create a sample with 4 PSUs per stratum
rows_multi = []
for s in range(1, 4):  # 3 strata
    for p in range(1, 5):  # 4 PSUs each
        for i in range(3):  # 3 units per PSU
            rows_multi.append({
                "stratum": f"S{s}",
                "psu": f"S{s}_P{p}",
                "wgt": 1.0,
                "y": rng.normal(10, 2),
            })

df_multi = pl.DataFrame(rows_multi)
multi_sample = svy.Sample(
    data=df_multi,
    design=svy.Design(stratum="stratum", psu="psu", wgt="wgt"),
)

print(f"Original design: 3 strata × 4 PSUs = 12 PSUs total")
Original design: 3 strata × 4 PSUs = 12 PSUs total
# BRR will fail - requires exactly 2 PSUs per stratum
try:
    multi_sample.weighting.create_brr_wgts()
except ValueError as e:
    print(f"BRR Error: {e}")
BRR Error: BRR requires 2 PSUs per stratum, stratum 2 has 4

Solution: Create Variance Strata

The create_variance_strata() method pairs PSUs within each original stratum to create new variance strata suitable for BRR or JK2:

# Create variance strata for BRR (pairs PSUs: 1-2, 3-4)
brr_ready = multi_sample.weighting.create_variance_strata(
    method="brr",
    into="var_stratum",  # Name of new stratum column
)

print(brr_ready)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 36                                          
   Number of columns: 8                                        
   Number of strata: 6                                         
   Number of PSUs: 12                                          
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                          
    Row index           svy_row_index                          
    Stratum             var_stratum                            
    PSU                 psu                                    
    SSU                 None                                   
    Weight              wgt                                    
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   None                                   
                                                               
╰───────────────────────────────────────────────────────────────╯
# Now BRR works!
brr_ready = brr_ready.weighting.create_brr_wgts(rep_prefix="brr")

print(f"Created {brr_ready.design.rep_wgts.n_reps} BRR replicates")
Created 8 BRR replicates
# Examine the variance strata structure
print(
    brr_ready.show_data(
        columns=["stratum", "psu", "var_stratum", "brr1", "brr2"],
        n=24,
    )
)
shape: (24, 5)
┌─────────┬───────┬─────────────┬──────┬──────┐
│ stratum ┆ psu   ┆ var_stratum ┆ brr1 ┆ brr2 │
│ ---     ┆ ---   ┆ ---         ┆ ---  ┆ ---  │
│ str     ┆ str   ┆ i64         ┆ f64  ┆ f64  │
╞═════════╪═══════╪═════════════╪══════╪══════╡
│ S1      ┆ S1_P1 ┆ 0           ┆ 2.0  ┆ 2.0  │
│ S1      ┆ S1_P1 ┆ 0           ┆ 2.0  ┆ 2.0  │
│ S1      ┆ S1_P1 ┆ 0           ┆ 2.0  ┆ 2.0  │
│ S1      ┆ S1_P2 ┆ 0           ┆ 0.0  ┆ 0.0  │
│ S1      ┆ S1_P2 ┆ 0           ┆ 0.0  ┆ 0.0  │
│ …       ┆ …     ┆ …           ┆ …    ┆ …    │
│ S2      ┆ S2_P3 ┆ 3           ┆ 2.0  ┆ 0.0  │
│ S2      ┆ S2_P3 ┆ 3           ┆ 2.0  ┆ 0.0  │
│ S2      ┆ S2_P4 ┆ 3           ┆ 0.0  ┆ 2.0  │
│ S2      ┆ S2_P4 ┆ 3           ┆ 0.0  ┆ 2.0  │
│ S2      ┆ S2_P4 ┆ 3           ┆ 0.0  ┆ 2.0  │
└─────────┴───────┴─────────────┴──────┴──────┘

Handling Odd PSU Counts with JK2

When a stratum has an odd number of PSUs, BRR cannot be used. JK2, however, allows 2–3 PSUs per variance stratum:

# Create sample with 3 PSUs per stratum (odd count)
rows_odd = []
for s in range(1, 4):  # 3 strata
    for p in range(1, 4):  # 3 PSUs each (odd)
        for i in range(3):
            rows_odd.append({
                "stratum": f"S{s}",
                "psu": f"S{s}_P{p}",
                "wgt": 1.0,
                "y": rng.normal(10, 2),
            })

df_odd = pl.DataFrame(rows_odd)
odd_sample = svy.Sample(
    data=df_odd,
    design=svy.Design(stratum="stratum", psu="psu", wgt="wgt"),
)

print("Original: 3 strata × 3 PSUs = 9 PSUs (odd count per stratum)")
Original: 3 strata × 3 PSUs = 9 PSUs (odd count per stratum)
# BRR fails with odd PSU counts
try:
    odd_sample.weighting.create_variance_strata(method="brr")
except Exception as e:
    print(f"BRR Error: {type(e).__name__}")
BRR Error: DimensionError
# JK2 handles odd counts by creating triplets
jk2_ready = odd_sample.weighting.create_variance_strata(
    method="jk2",
    into="var_stratum",
)

# Now JK2 works with triplets
jk2_ready = jk2_ready.weighting.create_jk_wgts(paired=True, rep_prefix="jk2")

print(f"Created {jk2_ready.design.rep_wgts.n_reps} JK2 replicates")
Created 3 JK2 replicates

Variance Strata Parameters

Parameter Type Default Description
method "brr" | "jk2" Required Target replication method
order_by str | Sequence[str] | None None Column(s) to sort PSUs before pairing
shuffle bool False Randomly shuffle PSUs before pairing
into str "svy_var_stratum" Name for new variance stratum column
rstate int | None None Random seed for shuffle

Ordering PSUs Before Pairing

You can control how PSUs are paired by sorting them first:

# Add a size variable for ordering
df_multi_size = df_multi.with_columns(
    pl.lit(rng.uniform(100, 1000, df_multi.height)).alias("pop_size")
)

sized_sample = svy.Sample(
    data=df_multi_size,
    design=svy.Design(stratum="stratum", psu="psu", wgt="wgt"),
)

# Pair similar-sized PSUs together
ordered_strata = sized_sample.weighting.create_variance_strata(
    method="brr",
    order_by="pop_size",  # Sort by population size before pairing
    into="var_stratum",
)

print("PSUs paired by population size within each stratum")
PSUs paired by population size within each stratum

Random Pairing

For reproducible random pairing:

random_strata = multi_sample.weighting.create_variance_strata(
    method="brr",
    shuffle=True,
    rstate=12345,
    into="var_stratum",
)

print("PSUs randomly paired within each stratum")
PSUs randomly paired within each stratum

Adjusting Replicate Weights

When you apply weight adjustments (nonresponse, poststratification, calibration, raking), you must also apply those same adjustments to each replicate weight. This ensures that variance estimates properly reflect the additional uncertainty introduced by the adjustment process.

The svy library handles this automatically. All adjustment methods include parameters for replicate weight handling:

Parameter Type Default Description
rep_wgts_prefix str | None Auto-generated Prefix for adjusted replicate weight columns
ignore_reps bool False If True, skip replicate weight adjustment
update_design_wgts bool True Update the design to reference new weights

Example: Full Adjustment Pipeline

Let’s walk through a complete pipeline: create replicate weights, then apply nonresponse adjustment and raking to both the main weight and all replicates.

Step 1: Create Replicate Weights

# Start fresh with base sample
pipeline_sample = svy.Sample(
    data=df,
    design=svy.Design(stratum="stratum", psu="psu", wgt="base_wgt"),
)

# Create bootstrap replicates
pipeline_sample = pipeline_sample.weighting.create_bs_wgts(
    n_reps=50,
    rep_prefix="rep_wgt",
    rstate=rng,
)

print(f"After creating replicates:")
print(pipeline_sample)
After creating replicates:
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 60                                       
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              base_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=Bootstrap,           
                        prefix='rep_wgt', n_reps=50, df=49.0)  
                                                               
╰───────────────────────────────────────────────────────────────╯

Step 2: Nonresponse Adjustment

Apply nonresponse adjustment to the main weight and all 50 replicates:

# Map data labels to response codes:
# 1 = respondent (rr), 2 = non-respondent (nr), 3 = ineligible (in), 4 = unknown (uk)
status_mapping = {
    "rr": "respondent",
    "nr": "non-respondent",
}

pipeline_sample = pipeline_sample.weighting.adjust_nr(
    resp_status="resp_status",
    by="stratum",  # Adjustment cells by stratum
    resp_mapping=status_mapping,
    wgt_name="nr_wgt",
    rep_wgts_prefix="nr_rep_wgt",  # Adjusted replicates get this prefix
    # ignore_reps=False,  # Default: adjust replicates too
)

print(f"After nonresponse adjustment:")
print(pipeline_sample)
After nonresponse adjustment:
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 37                                          
   Number of columns: 111                                      
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              nr_wgt                                 
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=Bootstrap,           
                        prefix='nr_rep_wgt', n_reps=50,        
                        df=49.0)                               
                                                               
╰───────────────────────────────────────────────────────────────╯

The design now references: - Main weight: nr_wgt - Replicate weights: nr_rep_wgt1, nr_rep_wgt2, …, nr_rep_wgt50

Step 3: Raking

Apply raking to match population margins:

# Define marginal controls
controls = {
    "x_cat": {"A": 15.0, "B": 12.0, "C": 13.0},
}

pipeline_sample = pipeline_sample.weighting.rake(
    controls=controls,
    wgt_name="final_wgt",
    rep_wgts_prefix="final_rep_wgt",
)

print(f"After raking:")
print(pipeline_sample)
After raking:
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 37                                          
   Number of columns: 162                                      
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              final_wgt                              
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=Bootstrap,           
                        prefix='final_rep_wgt', n_reps=50,     
                        df=49.0)                               
                                                               
╰───────────────────────────────────────────────────────────────╯

Step 4: Verify the Results

# Check main weight
print("Main weight (final_wgt) vs controls:")
main_wgt_sums = pipeline_sample.data.group_by("x_cat").agg(
    pl.col("final_wgt").sum().alias("weighted_sum")
).sort("x_cat")

for row in main_wgt_sums.iter_rows(named=True):
    cat = row["x_cat"]
    actual = row["weighted_sum"]
    target = controls["x_cat"][cat]
    diff = actual - target
    print(f"  {cat}: target={target:.2f}, actual={actual:.4f}, diff={diff:.6f}")

# Check a few replicate weights
print("\nReplicate weights vs controls:")
for rep_col in ["final_rep_wgt1", "final_rep_wgt2", "final_rep_wgt3"]:
    rep_sums = pipeline_sample.data.group_by("x_cat").agg(
        pl.col(rep_col).sum().alias("weighted_sum")
    ).sort("x_cat")

    total_diff = sum(
        abs(row["weighted_sum"] - controls["x_cat"][row["x_cat"]])
        for row in rep_sums.iter_rows(named=True)
    )
    print(f"  {rep_col}: max total deviation = {total_diff:.6f}")

# Overall totals
print(f"\nOverall total (should be {sum(controls['x_cat'].values())}):")
print(f"  Main weight sum: {pipeline_sample.data['final_wgt'].sum():.4f}")
Main weight (final_wgt) vs controls:
  A: target=15.00, actual=15.0000, diff=0.000000
  B: target=12.00, actual=12.0000, diff=0.000000
  C: target=13.00, actual=13.0000, diff=-0.000000

Replicate weights vs controls:
  final_rep_wgt1: max total deviation = 0.000000
  final_rep_wgt2: max total deviation = 0.000000
  final_rep_wgt3: max total deviation = 0.000000

Overall total (should be 40.0):
  Main weight sum: 40.0000

Skipping Replicate Adjustment

In some cases, you may want to skip replicate weight adjustment:

  • Preliminary analysis: Quick exploration before final weights are needed
  • Memory constraints: Large datasets with many replicates
  • External replicates: When replicates will be adjusted separately
# Apply adjustment to main weight only
quick_sample = sample.weighting.create_bs_wgts(n_reps=10, rep_prefix="rep", rstate=rng)

quick_sample = quick_sample.weighting.normalize(
    controls=100,
    wgt_name="norm_wgt",
    ignore_reps=True,  # Skip replicate adjustment
)

print(quick_sample)
╭─────────────────────────── Sample ────────────────────────────╮
 Survey Data:                                                  
   Number of rows: 40                                          
   Number of columns: 261                                      
   Number of strata: 4                                         
   Number of PSUs: 8                                           
                                                               
 Survey Design:                                                
                                                               
    Field               Value                                  
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  
    Row index           svy_row_index                          
    Stratum             stratum                                
    PSU                 psu                                    
    SSU                 None                                   
    Weight              norm_wgt                               
    With replacement    False                                  
    Prob                None                                   
    Hit                 None                                   
    MOS                 None                                   
    Population size     None                                   
    Replicate weights   RepWeights(method=Bootstrap,           
                        prefix='rep', n_reps=10, df=9.0)       
                                                               
╰───────────────────────────────────────────────────────────────╯
WarningImportant

If you skip replicate adjustment (ignore_reps=True), variance estimates computed from those replicates will not account for the additional variability introduced by the weight adjustment. This typically leads to underestimated standard errors.

Design Metadata

When you create replicate weights, svy stores metadata in the Sample.design.rep_wgts object:

bs_sample = sample.weighting.create_bs_wgts(n_reps=100, rep_prefix="bs", rstate=rng)

print(f"Method: {bs_sample.design.rep_wgts.method}")
print(f"Prefix: {bs_sample.design.rep_wgts.prefix}")
print(f"Number of replicates: {bs_sample.design.rep_wgts.n_reps}")
print(f"Degrees of freedom: {bs_sample.design.rep_wgts.df}")
print(f"Fay coefficient: {bs_sample.design.rep_wgts.fay_coef}")
Method: Bootstrap
Prefix: bs
Number of replicates: 100
Degrees of freedom: 99.0
Fay coefficient: 0.0

This metadata is used by estimation functions to automatically compute replicate-based variance estimates.

Choosing a Replication Method

If your design has… Recommended method
Exactly 2 PSUs per stratum BRR (most efficient)
2–3 PSUs per stratum JK2 (paired jackknife)
Varying PSUs per stratum JKn or Bootstrap
Many strata, few PSUs each Fay-BRR (dampened)
Systematic/ordered sample SDR
Complex non-linear statistics Bootstrap (≥500 reps)
Need to minimize file size JK2 (fewest columns)
Need to convert multi-PSU design create_variance_strata() first

Summary

This tutorial covered replicate weight methods for variance estimation:

Creating Replicate Weights:

Method Function Key Feature
BRR create_brr_wgts() Balanced half-samples; requires 2 PSUs/stratum
Fay-BRR create_brr_wgts(fay_coef=...) Damped BRR for stability
JKn create_jk_wgts(paired=False) Delete-one-PSU; # replicates = # PSUs
JK2 create_jk_wgts(paired=True) Paired deletion; # replicates = # strata
Bootstrap create_bs_wgts() Resample with replacement; most flexible
SDR create_sdr_wgts() For systematic/ordered samples

Preparing Designs:

Method Function Use Case
Variance Strata create_variance_strata(method="brr") Convert multi-PSU to paired for BRR
Variance Strata create_variance_strata(method="jk2") Convert multi-PSU to 2–3 PSUs for JK2

Adjusting Replicate Weights:

  • All adjustment methods (adjust_nr, poststratify, rake, calibrate, normalize) automatically propagate to replicates
  • Use rep_wgts_prefix to control naming of adjusted replicates
  • Use ignore_reps=True to skip replicate adjustment (not recommended for final analysis)

Next Steps

With your replicate weights created and adjusted, you’re ready to compute estimates with proper variance estimation. Continue to the Estimation tutorial to learn how to use these weights.

Ready to compute estimates?
Learn estimation methods in Survey Estimation →

References

  • Fay, R. E. (1989). Theory and application of replicate weighting for variance calculations. Proceedings of the Survey Research Methods Section, American Statistical Association, 212–217.
  • Judkins, D. R. (1990). Fay’s method for variance estimation. Journal of Official Statistics, 6(3), 223–239.
  • Rao, J. N. K., & Wu, C. F. J. (1988). Resampling inference with complex survey data. Journal of the American Statistical Association, 83(401), 231–241.
  • Rust, K. F., & Rao, J. N. K. (1996). Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical Research, 5(3), 283–310.
  • Valliant, R., Dever, J. A., & Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer.