ListMatchGenie
Back to blog
Guide 9 min read

Data Cleansing Before CRM Import: Step-by-Step

A practical guide to cleaning your CSV data before importing it into Salesforce, HubSpot, or any CRM. Avoid duplicates, formatting errors, and wasted time.

You have a CSV file with 5,000 contacts ready to import into your CRM. Maybe it is a purchased list, a trade show export, a partner referral file, or a consolidation from an old system. Before you hit that import button, stop. Importing dirty data into your CRM creates problems that compound over time: duplicate contacts, bounced emails, incorrect segmentation, and wasted sales effort.

This guide walks through a systematic data cleansing process you should follow before every CRM import. It applies whether you use Salesforce, HubSpot, Zoho, Pipedrive, or any other CRM.

Step 1: Profile Your Data

Before fixing anything, understand what you are working with. Open your CSV and check:

  • Row count: How many records? This tells you the scale of any issues you find.
  • Column inventory: What fields are present? Map them to your CRM fields mentally.
  • Completeness: What percentage of each column is filled? A column that is 80% empty may not be worth importing.
  • Uniqueness: Are there duplicate rows? Check by email, phone, or name + company combination.
  • Format consistency: Are dates in one format? Are phone numbers consistent? Are state fields using abbreviations or full names?

You can do this manually by scrolling through the file, or use a data profiling tool to get instant statistics. ListMatchGenie's free Data Health Check scans your CSV in the browser and reports these metrics without uploading your file anywhere.

Step 2: Remove Exact Duplicates

The easiest win. Sort by email address (or your primary identifier) and remove rows that are identical or share the same email. In Excel, use Remove Duplicates. In Google Sheets, use the UNIQUE function or the built-in duplicate removal.

Be careful with partial duplicates: "john@acme.com" and "john@acme.com " (with a trailing space) are different to a computer but the same person. Clean whitespace first (Step 3), then deduplicate.

Step 3: Fix Whitespace Issues

Whitespace problems are invisible but destructive. They cause CRM duplicate detection to fail, email validation to reject valid addresses, and search to miss records.

Common whitespace issues:

  • Leading spaces: " John" instead of "John"
  • Trailing spaces: "john@acme.com " instead of "john@acme.com"
  • Double spaces: "New York" instead of "New York"
  • Tab characters embedded in fields
  • Non-breaking spaces (character 160) that look like regular spaces but are not

In Excel, use the TRIM function on every text column. In a dedicated tool, this is usually a one-click operation across all columns.

Step 4: Standardize Casing

Inconsistent casing looks unprofessional in CRM reports and causes issues with deduplication. Common problems:

  • "JOHN SMITH" (all caps from a legacy system)
  • "john smith" (all lowercase from a web form)
  • "john Smith" (inconsistent)

Convert names to title case: "John Smith." Convert emails to lowercase: "john@acme.com." Leave company names as-is unless you have a clear standard, since company names often have intentional casing like "iPhone" or "eBay."

Step 5: Validate and Format Key Fields

Email addresses

Check for basic format validity. Remove rows with clearly invalid emails like "N/A," "none," "test@test.com," or emails missing the @ symbol. Convert all emails to lowercase. Check for common domain typos: "gmial.com" instead of "gmail.com" or "yahoo.con" instead of "yahoo.com."

Phone numbers

Standardize to a consistent format. If your CRM expects (555) 123-4567, convert all variations: "5551234567," "555-123-4567," "1-555-123-4567," "+15551234567." Remove phone numbers that are obviously invalid: all zeros, all ones, or fewer than 10 digits.

ZIP codes

US ZIP codes should be 5 digits. A common problem: Excel strips leading zeros, turning "01234" into "1234." If you see 4-digit ZIP codes, they probably need a leading zero. ZIP+4 codes (12345-6789) may need truncation depending on your CRM field.

State and country fields

Standardize to abbreviations (CA, NY, TX) or full names (California, New York, Texas) based on your CRM convention. Watch for variations: "Calif," "Ca," "California," "CA" should all map to the same value.

Dates

Convert to a single format. ISO 8601 (2026-04-03) is safest for imports. Watch for ambiguous dates: "04/03/2026" could be April 3 or March 4 depending on locale.

Step 6: Handle Null and Placeholder Values

Scan for values that represent missing data but are not actually empty: "N/A," "n/a," "NA," "None," "null," "NULL," "-," "unknown," "not provided," "TBD." These should either be converted to truly empty cells or removed, depending on your CRM import settings.

Also check for placeholder data that should not be imported: "Test Contact," "DO NOT IMPORT," "Sample Record," "John Doe" (often a test entry).

Step 7: Deduplicate Against Your Existing CRM

Even if your import file has no internal duplicates, it may contain contacts already in your CRM. Before importing:

  1. Export your current CRM contacts (at least email and name fields).
  2. Match the import file against the export using email as the primary key and fuzzy name matching as a secondary check.
  3. Flag duplicates and decide: skip them, update existing records, or create a review queue.

This step is where a matching tool pays for itself. Manually comparing a 5,000-row import against a 20,000-row CRM export is impractical in Excel. A fuzzy matching tool handles it in minutes.

Step 8: Validate Required Fields

Check your CRM's required fields for the object you are importing (Contact, Lead, Account). If Last Name is required, remove or fix rows with empty last names. If Company is required for Leads, ensure every row has a value.

Also verify that picklist values match your CRM options. If your CRM has a "Lead Source" picklist with "Trade Show," "Website," "Referral," make sure your import file uses those exact values, not "Tradeshow" or "Web" or "Referred."

Step 9: Create a Cleansing Checklist

If you import data regularly, create a standard checklist for your team:

  1. Profile the file (row count, completeness, format check)
  2. Trim whitespace from all text fields
  3. Standardize casing (title case for names, lowercase for emails)
  4. Validate email format and remove invalid entries
  5. Format phone numbers to CRM standard
  6. Fix ZIP codes (leading zeros, length validation)
  7. Standardize state/country abbreviations
  8. Replace placeholder values with empty cells
  9. Remove internal duplicates
  10. Match against existing CRM contacts
  11. Validate required fields and picklist values
  12. Review a sample of 20 rows visually before importing

Following this process consistently prevents the slow degradation of CRM data quality that most organizations experience. Clean data means better segmentation, accurate reporting, and higher deliverability on your email campaigns.

Topics

data cleansingCRM importSalesforceHubSpotCSV cleanupdata quality

Let the Genie handle the grunt work.

Free tier is real. No card. No forms. Just upload your first list and see the Genie clean and match it in under a minute.