populace

01 — the idea

The sampling frame, made executable.

Every population estimate rests on a frame: the list of units a sample is drawn from, and the weights that scale them back up to a country. In most pipelines that frame is implicit — scattered across data files, weight columns, and convention. When the convention breaks, the numbers break silently.

populace makes the frame a first-class datatype. Entity tables — people, households, tax units — with explicit links, typed weights that can never be silently zeroed, and a record of where every row came from. Imputation, calibration, and policy simulation are operators on that one object. The structure is built once and never re-derived.

02 — the stack

Six strategies, one frame.

Every strategy is an operator on the same weighted sampling frame, or a way of scoring what the operators did to it. Each has its own page — a 30-second explainer, the method, and a nested /paper as the stable citation target.

supportshipped

The support

Which records exist and what they carry: a spine of observed survey records with donor channels, completed by sequential zero-inflated quantile-regression-forest draws from the population's conditional distribution.

read the strategy →

calibrationshipped

The totals

Weights that hit administrative facts: gradient descent against thousands of hierarchical targets on a capped relative-error loss, with hard weight-ratio bounds against landmines.

read the strategy →

sparsityshipped

The economy

The same target surface on a fraction of the records: Hard Concrete gates select which candidate records survive into a deployable file while the weights fit the full surface.

read the strategy →

evaluationshipped

The referee

Any candidate population, scored on two currencies: fidelity to held-out survey views (the popdgp harness) and administrative target attainment, in-fit and held out.

read the strategy →

compositionin progress

The interactions

The support and the weights each pass their own single-operator checks; composition asks what happens when both run on the same file — realized landmines, repairability, and the target-vs-survey trade-off.

read the strategy →

dynamicsdesign stage

Time

Trajectory-weighted calibration and transitions as conditional models, under a scoring protocol where every claim resolves, backtests, or computes exactly from statute. First domain: U.S. Social Security.

read the strategy →

the support · the totals · the economy · the referee · the interactions · time

03 — the pipeline

How a release is built.

Every stage is an operator on one weighted frame, and the order is load-bearing: observations, then enrichment, then rules, then seeds, then simulation, then weights. Gates between stages make each hand-off refuse silently-broken inputs instead of passing them downstream. Two states are drawn below, both honestly: the pipeline the certified build actually runs — including the machinery its own postmortems condemned — and the operator algebra the open epics land (#395, #449, #463, #469).

shipping today — certified Build M (98bf731)

The certified build, drawn without flattery. The PUF twins and SCF/SIPP imputations still run on the ASEC spine before assembly; take-up seeding carries SSI-specific count-matching; materialization constructs an engine system per household batch; and the SSI reconcile loop re-materializes target families under fresh weights until its caps say stop. The dashed machinery is condemned by its own measurements — the deletion set of populace#463, #468 and #469.

intended — the operator algebra (#395 · #449 · #463 · #469)

The intended algebra. The multispine is assembled first and cloning and imputation become operators on the assembled frame (#395); take-up is seeded uniformly for every program at documented priors (#469); the engine materializes target columns exactly once, asserted by telemetry (#463); calibration is descent on a static matrix that includes the congressional-district surface (#288, #449); and the export answers to the hardened gate set, including the dollar-row fit and tail-concentration gates that certification of the current build motivated (#464).

04 — an application, not a seventh strategy

The stack, put to work on a geography without its own file.

The six strategies above build and validate one population. This strip is what happens when the same stack is pointed at a geography that has no comparable file of its own — applying the strategies rather than adding to them.

anywherepaper in progress

Open microsimulation anywhere

Most countries' household microdata is restricted; rules engines are no longer the bottleneck to modeling their tax and benefit systems, data access is. Recalibrating the open US support to a new geography's published totals — validated against the UK's held-out survey microdata, deployed in Belgium where no comparable microdata exists.

read the application →

localshipped

Down to state, district, and county

The same stack, resolved below the nation. An American Community Survey household spine carries PUMA-anchored geography, so a state, congressional district, or county is a filter over one national file — rather than a separate survey to source and calibrate for every locality. Calibrated to state administrative totals and state and congressional-district populations.

see the release →

firmspreprint

A new population: UK firms and the VAT threshold

The same recipe on business registers: synthetic firm records calibrated to HMRC and ONS band aggregates price level, shape, and rate reforms of the VAT registration threshold on one £184.8bn base — resolution at the notch that published statistics only report in coarse bands.

read the application →

05 — the sources

Where every layer comes from.

06 — the contract

Every input column, accounted for.

A reform only moves the numbers if the inputs it keys on carry signal. So every release declares a coverage contract: each input column the reference enhanced CPS exports is either present with non-default signal, or carries a reviewed exclusion naming a reason and a tracked issue. A hard release gate enforces it — a reform keyed on a missing input fails the build loudly, instead of silently scoring $0.

07 — releases

The current data, read live.

Every release publishes its manifests, calibration diagnostics, and reform validation next to the data. The rows below read the release registry directly, so this page cannot go stale: latest is the newest published build; certified is the build pinned as the policyengine.py default; local is the newest local-area build — published, not a default.

use it — the certified default in two lines

# pip install policyengine   (or: uv pip install policyengine)
from policyengine.tax_benefit_models.us import managed_microsimulation

sim = managed_microsimulation()   # policyengine_us.Microsimulation on Populace US 2024

managed_microsimulation() is pinned to the policyengine.py release bundle — the same certified build in the row above — so the dataset selection is reproducible, not ad-hoc. It returns a policyengine_us.Microsimulation; call its API directly.

Browse every target on the calibration strategy page · open the calibration dashboard · releases on Hugging Face

08 — evidence

It already matches the data it aims to replace.

PolicyEngine's enhanced Current Population Survey is the microdata behind millions of US policy calculations. The published populace-US release — built entirely from primary sources (the incumbent is the benchmark, never an input), with full variable parity — beats it on training, held-out, and full-surface loss in the matched-sample, symmetric-refit comparison.

training loss 0.18vs 1.09 lower is better

held-out loss 0.04vs 0.32 739 unseen targets

full-surface loss 0.21vs 1.41 all 3,704 targets

frozen benchmark · build populace-us-2024-5da5a95 · 2026-06-11 · matched 41,314 households, symmetric refit. Per individual target the incumbent still wins more often (2,528 of 3,704 to our 1,127) — we win big where we win and lose narrowly where we lose. Net short-term capital gains land on the signed PUF-anchored target (−$77.4B), and every donor is a primary survey; every remaining gap is itemized on the calibration strategy page. The two populations share an open-source unit-construction engine, so this measures synthesis quality on a partly shared scaffold. We report the gaps, not just the wins. This head-to-head is a fixed comparison as of that date; the current published build is read live under releases above.

Read the sparsity (L0) paper · Read the dynamics design paper · Browse every paper

09 — the commons

Toward one faithful record per person.

The long-run goal is a communal population that many parties improve — at full scale, one statistically faithful record for every person, carrying no one's private data. Contributions come in three forms, and they are exactly the three operators: records as new strata, conditional models trained on data a contributor holds, and facts as calibration targets.

A contribution merges only if it improves the population's score on held-out, rotating evidence without degrading any protected family. Privacy is enforced by provenance and measurement, not by blurring: public sources can be sharp, private evidence enters only through certified models, and the population must resemble held-out data — never anyone's training data.

→ records · a new stratum at honest weights
→ conditionals · certified P(y | x), never microdata
→ facts · targets with standard errors

Built in the open. Read it, break it, contribute.

github.com/PolicyEngine/populace The design charter

A nation is millions of households. We build a synthetic one that stands in for them all.

The sampling frame, made executable.

Six strategies, one frame.

The support

The totals

The economy

The referee

The interactions

Time

How a release is built.

The stack, put to work on a geography without its own file.

Open microsimulation anywhere

Down to state, district, and county

A new population: UK firms and the VAT threshold

Where every layer comes from.

Every input column, accounted for.

The current data, read live.

It already matches the data it aims to replace.

Toward one faithful record per person.

Built in the open. Read it, break it, contribute.

A nation is millions of households.
We build a synthetic one that stands in for them all.