Many matching scenarios require geographic proximity, not just exact location matches. A healthcare organization needs to find providers within 25 miles of a patient. A sales team wants to identify prospects near their field reps. A nonprofit needs to match donors to local chapters. In all these cases, you need to answer: are these two records geographically close?
The obvious approach is to use a geocoding API like Google Maps or Mapbox to convert addresses to coordinates, then calculate distances. But geocoding APIs have costs, rate limits, and privacy implications (you are sending addresses to a third party). For many matching use cases, there is a simpler approach: ZIP code centroid distance.
How ZIP Code Radius Matching Works
Every US ZIP code corresponds to a geographic area. The USPS publishes centroid coordinates (latitude and longitude of the approximate center) for all 43,000+ ZIP codes. By comparing the centroids of two ZIP codes using a distance formula, you get a reasonable approximation of the distance between any two addresses.
The accuracy depends on ZIP code size. Urban ZIP codes cover a few square miles, so the centroid is a good proxy for any address in that ZIP. Rural ZIP codes can cover hundreds of square miles, so the centroid may be 10+ miles from the actual address. For most matching use cases, this level of accuracy is sufficient.
The Haversine Formula
The Haversine formula calculates the great-circle distance between two points on a sphere given their latitude and longitude. It accounts for the curvature of the Earth, which matters for distances over a few miles.
The formula in plain terms:
- Convert both latitude/longitude pairs from degrees to radians
- Calculate the differences in latitude and longitude
- Apply the Haversine formula:
a = sin(dlat/2)^2 + cos(lat1) * cos(lat2) * sin(dlon/2)^2 - Calculate the angular distance:
c = 2 * asin(sqrt(a)) - Multiply by Earth's radius (3,959 miles or 6,371 km) to get the distance
For most programming languages, this is 5-10 lines of code. No API call needed.
Building a ZIP Code Distance Lookup
To implement ZIP code radius matching, you need two things: a database of ZIP code centroids and the Haversine formula.
Getting ZIP Code Data
Free sources of ZIP code centroid data include:
- US Census Bureau ZCTA files: ZIP Code Tabulation Areas with centroid coordinates, updated with each census.
- GeoNames: Open-source geographic database with ZIP code coordinates for the US and many other countries.
- OpenDataSoft: Hosts a clean, downloadable dataset of US ZIP codes with lat/lng.
- Python pyzipcode library: Bundles a SQLite database of US ZIP codes with coordinates, installable via pip.
Implementation Example
In Python with the pyzipcode library, finding all ZIP codes within 10 miles of a given ZIP is straightforward:
Install the library, create a database instance, and call the radius search method with your target ZIP and desired radius in miles. The library handles the Haversine calculation internally and returns all matching ZIP codes with their distances.
For matching two lists, the approach is: for each record in list A, find all ZIP codes within the desired radius, then match against records in list B that have any of those ZIP codes. This is far more efficient than calculating distances between every pair of records.
Scoring Proximity
Rather than a binary "within radius / not within radius," a proximity score provides more nuance. A common scoring approach uses distance tiers:
- Same ZIP code: 100% proximity score
- Within 1 mile: 95%
- Within 5 miles: 85%
- Within 10 miles: 70%
- Within 25 miles: 50%
- Beyond 25 miles: 0% (or linearly decaying to 0 at 50 miles)
This tiered score can be incorporated into a composite matching score alongside name, email, and other field matches. Two records with the same name and a proximity score of 85% are very likely the same person, even without an exact address match.
Handling Edge Cases
Missing ZIP Codes
If a record has no ZIP code, proximity matching cannot contribute to the score. Set the proximity score to neutral (neither helping nor hurting the overall match) or skip it entirely for that pair.
ZIP+4 Codes
ZIP+4 codes (like 90210-1234) provide more geographic precision than 5-digit ZIPs, but centroid databases typically only have 5-digit entries. Truncate to 5 digits before lookup: zip.split('-')[0].
PO Box ZIP Codes
Some ZIP codes are assigned exclusively to PO Boxes and do not correspond to a geographic area in the usual sense. Their centroids point to the post office location, which may be miles from where the person actually lives. Be aware that proximity scores for PO Box ZIPs may be less accurate.
Military and Territory ZIP Codes
ZIP codes starting with 09 (military APO/FPO), 96 (military Pacific), and codes for territories like Puerto Rico (00600-00988) and Guam (96910-96932) may not be in every centroid database. Handle missing lookups gracefully by returning a null distance rather than an error.
International Postal Code Matching
The same approach works for other countries with postal code systems. UK postcodes, Canadian postal codes, German PLZ codes, and most national postal systems have published centroid data. The Haversine formula is universal since it just needs lat/lng coordinates.
For UK postcodes specifically, the postcode area (the first 1-2 letters, like "SW" for South West London or "M" for Manchester) provides a coarse geographic grouping. Matching on postcode area gives you regional proximity without needing a full centroid database.
Using ZIP Radius in ListMatchGenie
ListMatchGenie includes ZIP code radius matching as a built-in feature. When the AI detects a ZIP code column, it automatically enables proximity scoring using the tiered approach described above. You can adjust the maximum radius (default 25 miles) in the match configuration.
The tool also supports UK postcodes with the same proximity scoring, using postcode area centroids for distance calculation. No API key, no geocoding costs, and no address data sent to third parties. Your ZIP codes are compared against a bundled database that runs entirely within the processing environment.

