ZIP codes are one of the most frequently corrupted fields in CSV data. The problems are subtle, widespread, and cause real downstream issues: failed address validation, incorrect geographic segmentation, broken mail merges, and poor match rates when comparing lists.
This guide covers the most common ZIP code problems and how to fix them, whether you are working in Excel, Google Sheets, Python, or a data quality tool.
Problem 1: Missing Leading Zeros
This is the most common ZIP code issue. US ZIP codes in the northeast start with zero: 01234 (Massachusetts), 06510 (Connecticut), 07102 (New Jersey). When a CSV is opened in Excel or saved from a system that treats ZIP codes as numbers, the leading zero is stripped. "01234" becomes "1234." "06510" becomes "6510."
A 4-digit ZIP code is always a 5-digit code with a missing leading zero.
How to fix in Excel
Select the ZIP column. Format cells as Text. Then use this formula to pad: =TEXT(A2, "00000"). This converts 1234 to "01234" and leaves 90210 as "90210."
How to fix in Google Sheets
Same formula works: =TEXT(A2, "00000"). Alternatively, format the column as Plain Text before pasting data.
How to fix in Python
df['zip'] = df['zip'].astype(str).str.zfill(5)
Prevention
Always store ZIP codes as text (strings), never as numbers. When exporting from databases or APIs, explicitly cast the field to string type. When opening a CSV in Excel, use the Import Wizard and set the ZIP column type to Text before importing.
Problem 2: ZIP+4 Inconsistency
Some records have 5-digit ZIPs (90210) while others have ZIP+4 (90210-1234). This inconsistency causes exact matching to fail: "90210" does not equal "90210-1234" even though they represent the same general area.
How to fix
Decide on a standard. For most use cases, 5-digit ZIP is sufficient. Truncate ZIP+4 codes:
Excel: =LEFT(A2, 5)
Python: df['zip'] = df['zip'].astype(str).str[:5]
If you need ZIP+4 precision (for USPS bulk mail, for example), standardize the format to include the hyphen: 12345-6789. Some systems store it without the hyphen (123456789), which should be reformatted.
Problem 3: Invalid ZIP Codes
Not every 5-digit number is a valid ZIP code. Common invalid entries include:
- "00000" — often a placeholder or default value
- "99999" — another common placeholder
- "12345" — frequently used as test data
- 3-digit or 6-digit values that are not ZIP codes
- Non-numeric characters: "N/A," "none," "unknown"
How to validate
A basic validation checks that the value is exactly 5 digits (or 10 characters in ZIP+4 format with hyphen). A more thorough validation checks against a reference database of valid US ZIP codes. The USPS publishes this data, and many tools include it.
In Excel, a basic check: =AND(LEN(A2)=5, ISNUMBER(VALUE(A2)))
For real validation, you need a lookup against known ZIPs. ListMatchGenie includes a database of 43,000+ US ZIP codes and validates automatically during data cleansing.
Problem 4: Canadian Postal Codes Mixed with US ZIPs
If your data includes both US and Canadian addresses, you may have a mix of formats: "90210" (US) and "M5V 2T6" (Canadian). These require different handling.
Canadian postal codes follow the pattern: letter-digit-letter space digit-letter-digit (A1A 1A1). Some systems store them without the space: "M5V2T6."
How to handle
First, identify which format each row uses. If you have a Country column, use it. If not, detect by format: if the value contains letters, it is likely Canadian; if it is all digits, it is likely US.
Standardize Canadian codes to uppercase with space: "M5V 2T6." Standardize US codes to 5-digit padded: "01234."
Problem 5: ZIP Codes Stored as Floating Point
A particularly annoying issue. Some systems export ZIP codes as floating-point numbers: "1234.0" or "90210.0." This happens when the ZIP column was stored as a numeric type in a database and exported without formatting.
How to fix
Convert to integer first, then to zero-padded string:
Python: df['zip'] = df['zip'].astype(float).astype(int).astype(str).str.zfill(5)
Excel: =TEXT(INT(A2), "00000")
Problem 6: Multiple ZIP Codes in One Cell
Occasionally you encounter cells like "90210; 90211" or "90210/90211" where someone has entered multiple ZIPs. These need to be split or reduced to a single value. If the record represents a person or company, use the first (presumably primary) ZIP code.
Batch Fixing ZIP Codes
If you are cleaning a large file with multiple ZIP code problems, the manual formula approach becomes tedious. A systematic approach:
- Detect the issues: Count how many rows have 4-digit ZIPs, non-numeric ZIPs, ZIP+4 codes, and completely empty values.
- Apply fixes in order: Remove non-numeric entries or flag for review. Strip ZIP+4 to 5-digit. Pad short codes with leading zeros. Validate against a reference database.
- Verify: After fixing, re-profile the column. All values should be 5 digits, all numeric, and ideally all present in the valid ZIP reference.
Data quality tools automate this entire process. When you upload a CSV to ListMatchGenie, the cleansing engine detects ZIP code columns automatically (even if they are labeled "Postal Code" or "Zip_Code" or "postal"), applies leading-zero padding, strips ZIP+4 suffixes, and flags invalid codes. You review and approve the changes before they are applied.
Getting ZIP codes right matters more than most people realize. Geographic analysis, shipping cost calculations, territory assignments, and proximity matching all depend on clean, standardized ZIP codes. Spending ten minutes fixing them before import saves hours of troubleshooting later.

