ZIP radius matching

By default, two records with different ZIP codes don't match on ZIP. ZIP radius matching relaxes this — within a configurable radius (e.g. 30 miles), ZIPs are considered "close" and the ZIP field contributes a non-zero score based on actual geographic distance.

This is useful when your data has ZIP-level noise (corrupt data, outdated addresses, or address-type differences between billing and shipping).

How it works

When ZIP radius is enabled:

For every ZIP, the engine looks up its latitude/longitude from an in-region postal-code database.
For every source/master pair being scored, the geographic distance between ZIPs is computed from the coordinates.
The ZIP subscore becomes a function of distance:

zip_score = 100 × (1 − distance_miles / max_distance)

If distance ≤ 0 (same ZIP), score = 100. If distance equals the max radius, score = 0. Between, linear interpolation.

Settings

Enable ZIP radius matchingtoggleDefault: off

Master on/off.

Max radiusmilesDefault: 30

Distance at which ZIP score drops to 0. Configurable per match.

CountryUS | CA | AU | otherDefault: auto-detect

Determines the postal-code database to use. Auto-detected from sample data but overridable.

When to enable

Enable when:

Your data has old or inaccurate addresses that may no longer have the exact correct ZIP
You're matching across billing vs shipping addresses for the same customer
You're matching location-based records (field sales territories, delivery zones, service areas)
Either file is known to have ZIP-level noise

Skip when:

Both files have clean, current ZIP data (exact match is better)
You're matching via identifier (ZIP is irrelevant)
Privacy rules prevent using precise location matching

Common radius choices

5 miles — tight. Matches within the same metro area. Use for dense urban customer matching.
15 miles — local. Urban + inner suburbs. Typical for US retail customer matching.
30 miles (default) — regional. Metro area + suburbs. Typical for general-purpose matching.
100 miles — broad. Multi-city regional match. Use when data is known to be coarse.
500+ miles — state-level. Usually too loose; consider state-match instead of radius.

Data source & attribution

All 20 supported regions now resolve postal-code coordinates through a single unified database derived from GeoNames under the CC-BY 4.0 license. For the United Kingdom, Canada, and the Netherlands we also incorporate the GeoNames "full-precision" archives — which for GB include Royal Mail postcode-unit data © 2022.

This is a permanent data dependency we refresh quarterly. No external API calls are made at match-time; lookups are satisfied locally from our regional data store.

Per-region radius precision

Not every region ships with unit-level precision — where a national postal authority licenses the fine-grained data (e.g. Ireland's Eircode) the best publicly-redistributable form is coarser. The engine adapts its scoring curve to whatever precision is available:

Precision	Meaning	Regions
`full`	Unit-level postcode centroids — Haversine distance in miles.	US, UK, Canada, Netherlands, Australia, New Zealand, Germany, Austria, Switzerland, Italy, Norway, Denmark, France, Spain, Portugal, Sweden, Poland, Mexico
`city`	Municipality-level centroids only. Tier widened: 10 mi=95, 50 mi=70.	Brazil (CEP data redacted; 5,500 municipality centroids)
`district`	Routing-key level only — no distance, same-district / adjacent scoring.	Ireland (Eircode is licensed; GeoNames redacts unit-level precision)

Full-precision scoring curve

For full-precision regions the curve is: same postal = 100, ≤1 mi = 95, ≤5 mi = 85, ≤10 mi = 70, ≤25 mi = 50, beyond that linearly decays to 0.

City-precision scoring curve

For Brazil the curve widens to accommodate municipality-only precision: same CEP = 100, ≤10 mi = 95, ≤25 mi = 85, ≤50 mi = 70, ≤100 mi = 50.

District-precision scoring

For Ireland: same routing key = 100, same first character (adjacent-district heuristic) = 50, otherwise 0.

Notes

Canada: full 6-character Canadian postal code lookup is included — Forward Sortation Area (FSA) is still available as a blocking key where full precision isn't desired.
Australia: 4-digit postcodes with coordinates.
UK: full SW1A 1AA-unit coordinates via the Royal Mail-enriched GeoNames archive.

Interaction with other settings

With state comparison

State comparison continues to work independently. A pair in the same ZIP-radius-range but different states gets both a ZIP-radius score and a state mismatch. Net effect: cross-state "close" matches score lower than same-state close matches, which is usually the right behavior.

With address comparison

Street address is compared separately. ZIP radius does not influence address subscore.

With the match engine passes

Enabling ZIP radius can significantly expand the candidate set in the blocking pass — records in adjacent ZIPs that would be in different blocks are now in the same block. Expect ~30% longer match time when ZIP radius is enabled, more for very large files.

Pitfalls

Urban vs rural

A 30-mile radius in Manhattan covers most of NYC and parts of NJ — millions of people. A 30-mile radius in rural Montana might cover three people. Pick a radius that's realistic for your data's geography.

Same address, different ZIP

Two records at the same physical address can have different ZIPs if one records the address ZIP (01841) and the other records the PO Box ZIP (01842). ZIP radius handles this gracefully; without it you'd get a false-negative ZIP mismatch.

Old ZIPs

ZIP codes change. A record from 1995 may have a ZIP that no longer exists. Our database handles deprecated ZIPs by mapping to their current successor — but if the original ZIP was never in the US (typo), the row is flagged with an "unknown ZIP" warning.

How matching works — where ZIP scoring fits in the pipeline
Field mapping — making sure ZIP columns align cross-file
One-to-one vs one-to-many — ZIP radius + 1:1 is a common combination