Duplicate records in your CRM are not just an annoyance. They cause real business problems: sales reps contact the same prospect twice, marketing sends duplicate emails that trigger spam filters, reporting inflates your actual customer count, and forecasting becomes unreliable because revenue is split across duplicate accounts.
Studies consistently show that CRM databases accumulate duplicates at a rate of 1-5% per month. After a year without deduplication, 10-30% of your records may be duplicates. After a data migration or list import, the rate can spike to 40% or higher.
This guide walks through a systematic deduplication process that works regardless of which CRM you use.
Step 1: Export Your CRM Data
Start by exporting the records you want to deduplicate. For contacts, export at minimum: first name, last name, email, phone, company name, and any custom identifiers (like account ID or external reference). For accounts or companies, export: company name, domain, phone, address, and industry.
In Salesforce, use Data Export or a report. In HubSpot, go to Contacts and use the Export button. Most CRMs export to CSV, which is the format you need for matching.
Export all records, not just recent ones. Duplicates often involve an old record and a new one, so you need the full dataset.
Step 2: Profile the Export
Before running any matching, understand your data quality:
- Email completeness: What percentage of records have an email address? Email is the strongest single matching field because it is usually unique per person.
- Name completeness: Are first and last names in separate fields, or combined? Are there records with only a company name and no person name?
- Phone format: Are phone numbers formatted consistently? Mixed formats reduce match accuracy.
- Obvious duplicates: Sort by email and quickly scan for exact email duplicates. These are the easiest wins.
Step 3: Run Exact Deduplication
Start with exact matching on your strongest identifier. For most CRMs, this is email address:
- Normalize emails: lowercase, trim whitespace, remove dots in Gmail addresses (j.smith@gmail.com equals jsmith@gmail.com).
- Group records with identical normalized emails.
- Any group with more than one record is a definite duplicate set.
This typically finds 5-15% of all duplicates. These are safe to merge with minimal review since the email match is strong evidence.
Step 4: Run Fuzzy Deduplication
After removing exact duplicates, run fuzzy matching to find the harder cases. These are records where the same person exists twice but with different email addresses, or where the email field is empty on one or both records.
Effective fuzzy deduplication for CRM records uses multiple signals:
- Name similarity: "Robert Johnson" and "Bob Johnson" at the same company are likely the same person. Use Jaro-Winkler for name fields plus phonetic matching for spelling variations.
- Company match: Same person name at the same company is a strong duplicate signal. Normalize company names (remove Inc., LLC, Corp.) before comparing.
- Phone match: Normalize to digits only, then compare. Same phone number with different names may indicate a shared office line rather than a duplicate, so weight this lower than email.
- Address proximity: Same name within the same ZIP code is a moderate duplicate signal. Use ZIP radius matching for nearby but not identical codes.
Step 5: Score and Categorize Duplicates
Assign each potential duplicate pair a confidence score based on the combined field matches. Then categorize:
- High confidence (90%+): Auto-merge candidates. These share multiple strong identifiers like email + name or phone + name + company.
- Medium confidence (70-89%): Manual review required. Typically these share a name and one other field but have some conflicting information.
- Low confidence (50-69%): Probably not duplicates, but worth a quick glance. Common last names at large companies often fall here.
Step 6: Define Merge Rules
Before merging, decide how to handle conflicting data between duplicate records:
- Keep the most recent value: For fields like phone number, address, and job title that change over time.
- Keep the most complete value: If one record has a full address and the other has only a city, keep the full address.
- Keep the earliest value: For fields like Created Date or First Touch, which represent historical events.
- Concatenate: For notes or description fields, combine both values so no information is lost.
- Preserve associations: When merging in Salesforce, ensure activities, opportunities, and cases from both records are preserved on the surviving record.
Step 7: Merge and Verify
Execute the merge in your CRM. Most CRMs have a built-in merge function for individual records. For bulk merges:
- Salesforce: Use the built-in Merge Contacts feature (up to 3 at a time) or a third-party tool like DemandTools for bulk operations.
- HubSpot: Use the Manage Duplicates tool under Contacts settings, or the merge API for bulk operations.
- Other CRMs: Check for a native merge feature or use the API to update and delete records programmatically.
After merging, verify by checking total record counts and running a sample review.
Step 8: Prevent Future Duplicates
Deduplication is not a one-time project. Set up ongoing prevention:
- Enable duplicate detection rules in your CRM (Salesforce Matching Rules, HubSpot duplicate management).
- Standardize data entry with required fields and validation rules.
- Run monthly deduplication scans to catch new duplicates before they accumulate.
- Use a matching tool on every list import to catch duplicates before they enter the CRM.
ListMatchGenie simplifies the entire process. Export your CRM data as a CSV, upload it, and the engine finds duplicates using five matching passes. Review the results, export the duplicate pairs with confidence scores, and use them to drive your CRM merge process. The free tier handles up to 1,000 records, enough to test the approach on a subset of your data.

