Sample weighting is the mechanism that allows analysts to generalize results from the sample to the target population. The design (or base) weights are derived as the inverse of the final probability of selection. In large-scale surveys, these design weights are often further adjusted to correct for nonresponse, extreme values, or to align auxiliary variables with known population controls.
The weighting tutorial is presented in two parts. The first section introduces common adjustment techniques—such as nonresponse adjustment, poststratification, and calibration—that improve representativeness and reduce potential bias. The second section demonstrates how to create and use replicate weights for variance estimation using Bootstrap (BST), Balanced Repeated Replication (BRR), and Jackknife Repeated Replication (JNK) methods.
For more on sample-weight adjustments, see Valliant and Dever (2018), which provides a step-by-step guide to calculating survey weights.
To run the code in this notebook, we will use the World Bank (2023) synthetic sample data.
import numpy as npfrom rich importprintas rprintimport svyfrom svy import load_datasethld_data = load_dataset(name="hld_sample_wb_2023", limit=None)print(f"The number of records in the household sample data is {hld_data.shape[0]}")
The number of records in the household sample data is 8000
Weight Adjustment Methods
In practice, base (design) weights derived from the selection probabilities are routinely adjusted to (i) correct for nonresponse and unknown eligibility, (ii) temper the influence of extreme/large weights, and (iii) align the weighted sample with known auxiliary controls. In this tutorial, you will:
Use adjust_nr() to adjust weights to account for unit nonresponse and unknown eligibility.
Use poststratify() to make weights match known control totals (or to force domain shares to known distributions).
Use calibrate() to adjust weights using auxiliary variables under a regression (GREG) framework, satisfying calibration constraints.
Use rake() to rescale weights so they sum to the chosen margins.
Use normalize() to rescale weights so they sum to a chosen constant (e.g., the sample size, a population total, or domain totals).
Design (Base) Weight
The design or base weight represents the inverse of the overall probability of selection. This probability is the product of the first-stage and second-stage selection probabilities, as explained in previous tutorials.
Let’s create the sample as follows
from svy import Design, Sample# Define the sampling designhld_design = Design(stratum=["geo1", "urbrur"], psu=["ea"], wgt="hhweight")# Create the samplehld_sample = Sample(data=hld_data, design=hld_design)
The dataset includes a household-level base weight variable named hhweight, we will rename it to base_wgt.
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 8000 ││ Number of columns: 52 ││ Number of strata: 19 ││ Number of PSUs: 320 ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum (geo1, urbrur) ││ PSU (ea,) ││ SSU None ││ Weight base_wgt ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights None │││╰───────────────────────────────────────────────────────────────╯
Response Status
The core idea of nonresponse adjustment is to redistribute the survey weights of eligible non-respondents to eligible respondents, and to do so within defined adjustment classes.
In practice, some units have unknown eligibility. How their weights are handled is survey-specific. Common options include:
Treat unknowns like eligibles: redistribute their weights to respondents in the same class.
Partition unknowns: allocate a fraction to ineligibles (reflecting the belief that some unknowns would have been ineligible) and the remainder to eligibles.
Exclude unknowns from redistribution: leave ineligible weights unchanged and redistribute only non-respondent (known eligible) weights.
In svy, we classify records into four response categories (eligible respondent, eligible non-respondent, ineligible, unknown). By default, unknowns are treated as potentially ineligible; therefore their weights are redistributed to the ineligible group as well.
In the World Bank simulated data, the observed response rate is 100%. For this tutorial, we will simulate ineligibility and nonresponse and then compute nonresponse-adjusted weights by weighting class.
import numpy as npimport polars as plrng = np.random.default_rng(12345)RESPONSE_STATUS = rng.choice( ("ineligible", "respondent", "non-respondent", "unknown"), p=(0.03, 0.82, 0.10, 0.05), size=hld_sample.n_records,)hld_sample = hld_sample.wrangling.mutate({"resp_status": RESPONSE_STATUS})# Show 9 eligible non-respondents records in the geo_01print( hld_sample.show_records( columns=["hid", "geo1", "urbrur", "resp_status"], where=[ svy.col("resp_status") =="non-respondent", svy.col("geo1").is_in(["geo_01"]), ], n=9, ))
svy expects a single variable containing standardized response statuses:
Code
Meaning
rr
Respondent
nr
Nonrespondent
uk
Unknown eligible
in
Ineligible
If your dataset uses different labels or codes, you must provide a mapping to these canonical values before using weighting or adjustment functions in svy. Hence, we create the mapping as follows:
# Mapping of canonical response status codes to descriptive labelsstatus_mapping = {"in": "ineligible","rr": "respondent","nr": "non-respondent","uk": "unknown",}
Nonresponse Adjustment
TipImplementation in svy
Use Sample.adjust_nr() to adjust the sample weights for nonresponse. The method computes the adjusted weights and stores them in the sample object for downstream estimation.
The method adjust() has a boolean parameter unknown_to_inelig that controls where the unknowns’ weights go:
unknown_to_inelig=True (default): Unknowns’ weights are redistributed to ineligibles (as well as being accounted for in the class totals). Because ineligibles absorb some of the unknown mass, the respondents’ adjusted weights are generally smaller than when this flag is false.
unknown_to_inelig=False: Unknowns’ weights are not given to ineligibles. Ineligibles’ weights therefore do not change due to unknowns, and—other things equal—respondents’ adjusted weights are larger than in the default setting.
We now display a small sample of records to confirm the nr_wgt column was created.
# Show a random sample with the key columnsout = hld_sample.show_data( columns=["hid", "geo1", "geo2", "base_wgt", "resp_status", "nr_wgt"], how="sample", n=10, rstate=rng,)print(out)
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 6629 ││ Number of columns: 54 ││ Number of strata: 19 ││ Number of PSUs: 320 ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum (geo1, urbrur) ││ PSU (ea,) ││ SSU None ││ Weight nr_wgt ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights None │││╰───────────────────────────────────────────────────────────────╯
If you don’t specify wgt_name=…, svy creates the adjusted weight automatically and names it svy_adjusted_. For example, if the original weight column is base_wgt, the new column will be svy_adjusted_base_wgt.
If you want to replace the pre-adjusted variable by the adjusted one then set “replace=True”.
svy also updates the sample design internally so that the design’s weight reference points to the adjusted weight.
Poststratification
Poststratification helps compensate for under- or over-representation in the sample and can mitigate certain non-sampling errors. The most common approach adjusts sample weights so that, within poststratification classes, the weighted sums match known control totals from reliable sources.
Poststratification classes need not mirror the sampling design; they can be formed from additional variables. Typical choices in the social science research include age group, gender, race/ethnicity, and education—alone or in combination.
Warning
Use current, reliable controls: Poststratifying to out-of-date or unreliable totals may not improve estimates and can introduce bias. Proceed with caution and document sources and reference dates.
TipImplementation in svy
Use Sample.poststratify() to poststratify the sample weights. The method computes the adjusted weights and stores them in the sample object for downstream estimation.
Let’s assume that we have a reliable source, e.g. a recent census, that provides the number of households per admin 1 level.
Note that the control totals are provided using the Python dictionary census_households. Let assume the following control totals.
We now display a small sample of records to confirm the ps_wgt column was created.
# Show a random sample with the key columnsout = hld_sample.show_data( columns=["hid","geo1","nr_wgt","ps_wgt", ], how="sample", n=10, sort_by="geo1", rstate=rng,)print(out)
And again svy updates the sampling design with the current “final” sample weight.
print(hld_sample)
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 6629 ││ Number of columns: 55 ││ Number of strata: 19 ││ Number of PSUs: 320 ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum (geo1, urbrur) ││ PSU (ea,) ││ SSU None ││ Weight ps_wgt ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights None │││╰───────────────────────────────────────────────────────────────╯
Calibration
Calibration, see Deville and Särndal (1992), is a general method used in survey sampling to adjust sample weights so that certain totals in the sample align with known values for the entire population. The idea is to make the sample “look like” the population in terms of key auxiliary variables, while keeping the adjusted weights as close as possible to the original design weights.
The Generalized Regression (GREG) approach is a model-assisted version of calibration; see Särndal, Swensson, and Wretman (1992) for a thorough exposé of model-assisted survey sampling. It assumes that the survey variable of interest—such as income or health status—is related to one or more auxiliary variables (like age, region, or education) through a regression-type relationship. Rather than relying solely on the design weights, GREG calibration adjusts those weights using information from both:
the survey data (what we observe in the sample), and
external data (known population totals for the auxiliary variables).
In essence, GREG finds a new set of weights that stay as close as possible to the original design weights and, at the same time, make the weighted totals of the auxiliary variables match their known population values. When the auxiliary variables are strongly correlated with the study variable, the resulting GREG estimates tend to be more stable and efficient than simple design-based estimates.
TipImplementation in svy
Use Sample.calibration() or Sample.calibrate_matrix() to calibrate using the GREG approach to adjust the sample weights. The methods compute the adjusted weights and stores them in the sample object for downstream estimation.
If you want to control by domain then you can do so using the parameter by of calibrate().
Raking
Raking, also known as iterative proportional fitting (IPF), is a method used to adjust survey weights so that the weighted sample distributions match known population margins for several categorical variables.
Unlike calibration, which can align with multiple totals simultaneously through a regression framework, raking updates the weights iteratively—adjusting one margin at a time until all specified margins agree (within tolerance) with the population controls.
Raking is especially useful when only marginal totals are available (for example, totals by age group and totals by gender, but not their cross-tabulation). It preserves the relative structure of the sample while bringing the weighted distributions into alignment with known population information.
TipImplementation in svy
Use Sample.rake() to rake the sample weights. The method computes the adjusted weights and stores them in the sample object for downstream estimation.
# Show a random sample with the key columnsout = hld_sample.show_data( columns=["hid","statocc","electricity","ps_wgt","rake_wgt", ], how="sample", n=10, sort_by=("statocc", "electricity"), rstate=rng,)print(out)
Surveys sometimes normalize weights to a convenient constant (e.g., the sample size n or 1,000) so results are easier to compare across analyses. Normalization multiplies every weight by the same factor. It does not change weighted means, proportions, or regression coefficients (the factor cancels), but it does change level estimates such as totals—and their standard errors—by the same factor.
TipImplementation in svy
Use Sample.normalize() to scale the sample weights. The method computes the adjusted weights and stores them in the sample object for downstream estimation.
This section introduces three common replication methods for estimating sampling variance: Bootstrap (BS), Balanced Repeated Replication (BRR), and Jackknife (JK).
Replicate weights are constructed primarily for variance (uncertainty) estimation. They are especially useful when:
Estimating non-linear parameters, for which Taylor linearization may be inaccurate.
The number of PSUs per stratum is small (low degrees of freedom), making linearization unstable.
In this tutorial, we demonstrate how to create replicate weights using the ReplicateWeight class. Three replication methods are available:
Balanced Repeated Replication (BRR), including Fay-BRR
Bootstrap (BS)
Jackknife (JK)
BRR assumes exactly two PSUs per stratum after any collapsing. The World Bank synthetic dataset includes strata with more than two PSUs; to use BRR you would pair PSUs within each stratum to create 2-PSU pseudo-strata (e.g., sort by size/geography and pair 1–2, 3–4, …). To keep this tutorial focused on demonstrating svy syntax rather than data engineering, we’ll instead construct a small BRR-compatible dummy sample. (If you need to retain all PSUs without pairing, consider jackknife or bootstrap replicates, which handle ≥2 PSUs per stratum.)
Let’s consider the following dataset.
rows = []y_means = {"S1_P1": 10,"S1_P2": 12,"S2_P1": 8,"S2_P2": 9,"S3_P1": 15,"S3_P2": 13,"S4_P1": 11,"S4_P2": 10,}for s inrange(1, 5): # S1..S4for p inrange(1, 3): # P1..P2 (2 PSUs per stratum) label =f"S{s}_P{p}"for i inrange(3): # 3 units per PSU rows.append( {"unit_id": f"S{s}P{p}U{i +1}","stratum": f"S{s}","cluster": f"P{p}","weight": 1.0, # base weight"y": rng.normal(y_means[label], 1.0), # outcome } )df_rep = pl.DataFrame(rows)print(df_rep)
BRR forms balanced half-samples within each stratum using a Hadamard design and—after any collapsing—requires exactly 2 PSUs per stratum.
By default, svy sets the number of replicates to the smallest multiple of 4 strictly greater than the number of strata. You can request more by passing n_reps (values ≤28 must be multiples of 4; larger values are rounded up to the next power of two).
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 24 ││ Number of columns: 16 ││ Number of strata: 4 ││ Number of PSUs: 8 ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum stratum ││ PSU cluster ││ SSU None ││ Weight weight ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights BRR (n_reps=8, df=4, fay=0, cols=8) │││╰───────────────────────────────────────────────────────────────╯
Fay–BRR is a damped BRR: each replicate weight is a combinaision of the full weight and the BRR half-sample weight. Choose a Fay factor \(\rho \in (0,1)\), commonly between 0.3 and 0.5, to reduce perturbation and improve stability.
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 24 ││ Number of columns: 28 ││ Number of strata: 4 ││ Number of PSUs: 8 ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum stratum ││ PSU cluster ││ SSU None ││ Weight weight ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights BRR (n_reps=8, df=4, fay=0, cols=8) │││╰───────────────────────────────────────────────────────────────╯
Jackknife (JK)
Jackknife forms replicates by deleting one PSU within each stratum and re-weighting the remainder. Each stratum must have two ro more PSUs.
╭─────────────────────────── Sample ────────────────────────────╮│Survey Data:││ Number of rows: 24 ││ Number of columns: 36 ││ Number of strata: 4 ││ Number of PSUs: 8 ││││Survey Design:││││Field Value ││ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ││ Row index svy_row_index ││ Stratum stratum ││ PSU cluster ││ SSU None ││ Weight weight ││ With replacement False ││ Prob None ││ Hit None ││ MOS None ││ Population size None ││ Replicate weights BRR (n_reps=8, df=4, fay=0, cols=8) │││╰───────────────────────────────────────────────────────────────╯
Bootstrap (BS)
Bootstrap replicates are formed by re-sampling PSUs with replacement within each stratum, drawing the same number of PSUs as observed in the sample for every replicate. The selection is independent across replicates. We then rescale the weights (e.g., Rao–Wu rescaled bootstrap) so estimators remain unbiased under the design.
Replicate count: if n_reps is omitted, create_bs_wgts() defaults to 500 (increase for highly non-linear targets).