Deduplicate a customer list

If your customer list has duplicates, nothing downstream works right. Email campaigns double-send. Sales calls the same person twice. Reports overcount. Integrations fight each other. And every tool that reads the list inherits the mess.

This guide walks you through cleaning a single customer list end-to-end. By the end you'll have an export with duplicates flagged or merged, plus a clear record of every decision so you can push the cleaned data back to your source system with confidence.

When to use this guide

Use this workflow when:

You have one file (not two) and want to find duplicates inside it.
Duplicates are not necessarily exact — names may be spelled differently, emails may differ by capitalization, some rows have more complete data than others.
You want the output as a cleaned list you can re-import somewhere, or as a list of decisions ("merge row 47 into row 23") you can apply in your source system.

If you're trying to find overlap between two lists, see Find overlap between two lists instead.

Before you start

Prepare your file:

CSV, TSV, XLSX, or XLS format. See supported file formats for specifics.
Include all identity columns — name, email, phone, and company if applicable. The more identity signal the Genie has, the better dedup works.
Trim useless columns. Notes, tags, and system metadata are fine to keep if you want them preserved, but dedup runs faster and cleaner on narrower files.
If possible, include a unique source ID (your CRM's record ID). This makes applying decisions back in the source system trivial.

The workflow

Upload your list
From the dashboard, click New Match. In the upload step, drop your file on the Source tile.
You don't need a master file for this workflow. The Contact dedupe profile runs the source against itself.
Review the column profile
The Genie will profile every column. Confirm:
- Email columns are detected as email
- Phone columns are detected as phone
- Name columns are detected as first_name / last_name
- Any ID column is detected as identifier
If any detection is wrong, override it inline. Dedup quality depends on the Genie knowing what each column represents.
Let the Genie cleanse
Advance to the Cleanse step. Read the cleansing report — the summary narrative at the top tells you the big picture in one paragraph.
The dedup report will already show you how many exact and near-exact duplicates exist before you've done anything else. This is often an eye-opener.
Advance when you're ready — don't overthink cleansing decisions, they're all reversible.
Pick the Contact dedupe profile
On the Configure step, pick Contact dedupe as your match profile.
Set the confidence threshold. For dedupe, 65 (slightly lower than the default) is a good starting point — within-file duplicates are usually cleaner than cross-file matches, so the engine can be more confident at lower scores.
If you have strong identity signals (email + phone + name), leave the default weights. If your file is name-only, give more weight to first_name and last_name.
Run the dedupe
Click Run match. The engine scans every row pair, scores similarity, and groups likely duplicates into clusters.
On the Review step, you'll see:
- Cluster count — how many groups of probable duplicates the engine found
- Rows in clusters — how many total rows are flagged as a duplicate of something
- Cluster size distribution — pairs are most common, but you may see clusters of 3, 4, or more
Every cluster has a _lmg_cluster_id that will appear in the export.
Work through the review queue
For each cluster, the review queue shows all rows side-by-side with differing fields highlighted. You have three options per cluster:
- Merge — collapse all rows into one, keeping the most complete value for each column. Marks the winners and losers in the export.
- Keep separate — the rows look similar but are actually different people. The engine won't flag them again.
- Delete some — keep the primary row, delete the others outright.
Work through clusters highest-score-first — those are the most obvious duplicates, easiest to decide. Lower scores take more judgment.
Bulk actions save hours
If your list is large and mostly clean, use the "Accept all clusters above 85%" bulk action to knock out the obvious ones in one click, then work through the 65–85 band by hand.
Export the cleaned list
Advance to Export. You have two useful output shapes:
- Cleaned list — one row per unique entity, with merged values. Drop this straight back into your system.
- Decision log — every original row with a _lmg_cluster_id, _lmg_review_decision, and _lmg_master_row_id pointing to the surviving row. Use this when you want to apply dedupe decisions in your CRM without letting ListMatchGenie do the write.
XLSX gives you both in one workbook. CSV gives you whichever you pick.

Applying the cleanup in your system

Once you have the export, you have three common patterns:

Pattern 1: Replace the list

If the list isn't the source-of-truth (e.g. it was exported from somewhere as a snapshot), just re-import the cleaned version wherever it needs to go.

Pattern 2: Merge in-system

If the list is from a CRM and you want to keep the CRM IDs, use the decision log. For each cluster:

Identify the winner row (_lmg_review_decision = approved).
In your CRM, merge the loser records into the winner (most CRMs have a "merge contacts" function).
Use the _lmg_master_row_id on losers to find the winner's CRM ID.

Pattern 3: Mark and re-review

If you want a second set of eyes before merging, export with the decisions and share it as a spreadsheet. _lmg_review_decision is a human-readable column.

Common gotchas

Family members at the same address. The engine may cluster them. Most common signal to disambiguate is email or phone — if those differ, keep separate.
Legitimate rebrands. Same company, different name. These are fuzzy duplicates but should stay separate if they represent different time periods.
Copied-then-edited rows. Someone duplicated a row to create a new contact at the same company, then forgot to update half the fields. These should merge only if you can confirm they represent the same person.

Dedup report — the underlying report structure
Match profiles — the profile being used
Master vs source files — why dedupe is often a pre-step before a cross-file match

Upload your list

Review the column profile

Let the Genie cleanse

Pick the Contact dedupe profile

Run the dedupe

Work through the review queue

Export the cleaned list