Tutorials - Small Area Estimation with svy-sae
SAE tutorials, Fay-Herriot model, EBLUP, area-level models, unit-level models
Small Area Estimation (SAE) refers to a class of statistical methods designed to produce reliable estimates for domains (or “small areas”) where direct survey estimates are unreliable or unavailable due to small sample sizes. SAE approaches improve precision by introducing statistical models that borrow strength across domains or units, enabling estimation even when domain-level sample sizes are limited or zero.
In practice, SAE methods are commonly grouped into two broad categories. Area-level models perform modeling at the aggregated domain level, using domain-specific summary measures. Unit-level models perform modeling at the individual unit level, allowing information to be shared across units and domains through a common model structure. Both modeling frameworks are supported in svy-sae, with several methodological options and estimation workflows. Further details are introduced progressively throughout the tutorials.
This tutorial series focuses on the practical application of Small Area Estimation methods rather than their underlying statistical theory. For a comprehensive theoretical treatment of SAE, we recommend:
Rao, J. N. K., & Molina, I. (2015). Small Area Estimation (2nd ed.). Wiley.
How to use this tutorial
The tutorials provide a hands-on introduction to small area estimation workflows using svy-sae. Examples are self-contained and build progressively, but sections can also be read independently depending on the reader’s background and objectives.
The emphasis is on end-to-end workflows: from data preparation and direct estimation to model fitting, diagnostics, and uncertainty assessment.
Tutorial structure
The tutorials are organized as follows:
1. Getting started
- Installation and setup
- Package structure and basic concepts
2. Area-level models
- Area-level modeling workflows
- Estimation and diagnostics
- Uncertainty measures
3. Unit-level models
- Unit-level modeling workflows
- Prediction and aggregation
- Practical considerations
Conventions used in the tutorials
- Code examples assume Python 3.11 or newer
- Data are represented using
polarsDataFrames - Survey design objects and direct estimators are created using
svy