Sample Weighting
Sample weighting is the mechanism that allows analysts to generalize results from the sample to the target population. The design (or base) weights are derived as the inverse of the final probability of selection. In large-scale surveys, these design weights are often further adjusted to correct for nonresponse, extreme values, or to align auxiliary variables with known population controls.
The weighting tutorial is presented in two parts. The first section introduces common adjustment techniques—such as nonresponse adjustment, poststratification, and calibration—that improve representativeness and reduce potential bias. The second section demonstrates how to create and use replicate weights for variance estimation using Bootstrap (BST), Balanced Repeated Replication (BRR), and Jackknife Repeated Replication (JK) methods.
For more on sample-weight adjustments, see Valliant and Dever (2018) (Valliant and Dever 2018), which provides a step-by-step guide to calculating survey weights.
Weight Adjustment Methods
Replicate Weights
This section introduces three common replication methods for estimating sampling variance: Bootstrap (BST), Balanced Repeated Replication (BRR), and Jackknife (JNK).
Replicate weights are constructed primarily for variance (uncertainty) estimation. They are especially useful when:
- Estimating non-linear parameters, for which Taylor linearization may be inaccurate.
- The number of PSUs per stratum is small (low degrees of freedom), making linearization unstable.
In this tutorial, we demonstrate how to create replicate weights using the ReplicateWeight class. Three replication methods are available:
- Balanced Repeated Replication (BRR), including Fay-BRR
- Bootstrap (BST)
- Jackknife (JNK)
Select the method at initialization via the method parameter: “brr”, “bootstrap”, or “jackknife”.
Acronyms used throughout: BST = Bootstrap, BRR = Balanced Repeated Replication, JNK = Jackknife.
Balanced Repeated Replication (BRR)
The core idea of Balanced Repeated Replication (BRR) is to split the sample into independent half-samples (replicates) within each stratum and use these to estimate variance. BRR is especially useful when exactly two primary sampling units (PSUs) are selected per stratum.
Concept
For each replicate \(r\):
- In every stratum, assign one PSU to half-sample A and the other to half-sample B following a balanced pattern, ensuring that each PSU appears equally often in A vs B across replicates.
- Construct replicate weights by doubling the weights in one half-sample and zeroing the weights in the other:
\[ w_i^{(r)} = \begin{cases} 2\,w_i, & \text{if PSU } i \text{ is in half-sample A for replicate } r,\\[4pt] 0, & \text{if PSU } i \text{ is in half-sample B for replicate } r. \end{cases} \]
- The pattern of A/B assignments across replicates is generated from a Hadamard matrix, which guarantees the required balance and approximate independence of replicates.
Fay-BRR
To avoid zero weights and improve numerical stability, the Fay-BRR modification introduces a Fay factor \(0 \le f < 1\):
\[ w_i^{(r)} = \begin{cases} (1+f)\,w_i, & \text{if PSU } i \text{ is in A},\\[4pt] (1-f)\,w_i, & \text{if PSU } i \text{ is in B}. \end{cases} \]
Common choices are \(f=0.3\) or \(f=0.5\).
Practical Notes
- The number of BRR replicates equals the order of the Hadamard matrix used (e.g., 4, 8, 16, 32, 64, …).
- When more than two PSUs are selected per stratum, you may form pseudo-strata with two PSUs each or use Jackknife (JNK) or Bootstrap (BST) instead.
- BRR is simple to implement and works well for designs with two PSUs per stratum.