ListMatchGenie

Deduplicate a customer list

Find and merge duplicate contacts inside a single list — even when duplicates are fuzzy, incomplete, or spread across typos. A full end-to-end walkthrough.

If your customer list has duplicates, nothing downstream works right. Email campaigns double-send. Sales calls the same person twice. Reports overcount. Integrations fight each other. And every tool that reads the list inherits the mess.

This guide walks you through cleaning a single customer list end-to-end. By the end you'll have an export with duplicates flagged or merged, plus a clear record of every decision so you can push the cleaned data back to your source system with confidence.

When to use this guide

Use this workflow when:

  • You have one file (not two) and want to find duplicates inside it.
  • Duplicates are not necessarily exact — names may be spelled differently, emails may differ by capitalization, some rows have more complete data than others.
  • You want the output as a cleaned list you can re-import somewhere, or as a list of decisions ("merge row 47 into row 23") you can apply in your source system.

If you're trying to find overlap between two lists, see Find overlap between two lists instead.

Before you start

Prepare your file:

  • CSV, TSV, XLSX, or XLS format. See supported file formats for specifics.
  • Include all identity columns — name, email, phone, and company if applicable. The more identity signal the Genie has, the better dedup works.
  • Trim useless columns. Notes, tags, and system metadata are fine to keep if you want them preserved, but dedup runs faster and cleaner on narrower files.
  • If possible, include a unique source ID (your CRM's record ID). This makes applying decisions back in the source system trivial.

The workflow

  1. Upload your list

    From the dashboard, click New Match. In the upload step, drop your file on the Source tile.

    You don't need a master file for this workflow. The Contact dedupe profile runs the source against itself.

  2. Review the column profile

    The Genie will profile every column. Confirm:

    • Email columns are detected as email
    • Phone columns are detected as phone
    • Name columns are detected as first_name / last_name
    • Any ID column is detected as identifier

    If any detection is wrong, override it inline. Dedup quality depends on the Genie knowing what each column represents.

  3. Let the Genie cleanse

    Advance to the Cleanse step. Read the cleansing report — the summary narrative at the top tells you the big picture in one paragraph.

    The dedup report will already show you how many exact and near-exact duplicates exist before you've done anything else. This is often an eye-opener.

    Advance when you're ready — don't overthink cleansing decisions, they're all reversible.

  4. Pick the Contact dedupe profile

    On the Configure step, pick Contact dedupe as your match profile.

    Set the confidence threshold. For dedupe, 65 (slightly lower than the default) is a good starting point — within-file duplicates are usually cleaner than cross-file matches, so the engine can be more confident at lower scores.

    If you have strong identity signals (email + phone + name), leave the default weights. If your file is name-only, give more weight to first_name and last_name.

  5. Run the dedupe

    Click Run match. The engine scans every row pair, scores similarity, and groups likely duplicates into clusters.

    On the Review step, you'll see:

    • Cluster count — how many groups of probable duplicates the engine found
    • Rows in clusters — how many total rows are flagged as a duplicate of something
    • Cluster size distribution — pairs are most common, but you may see clusters of 3, 4, or more

    Every cluster has a _lmg_cluster_id that will appear in the export.

  6. Work through the review queue

    For each cluster, the review queue shows all rows side-by-side with differing fields highlighted. You have three options per cluster:

    • Merge — collapse all rows into one, keeping the most complete value for each column. Marks the winners and losers in the export.
    • Keep separate — the rows look similar but are actually different people. The engine won't flag them again.
    • Delete some — keep the primary row, delete the others outright.

    Work through clusters highest-score-first — those are the most obvious duplicates, easiest to decide. Lower scores take more judgment.

    Bulk actions save hours

    If your list is large and mostly clean, use the "Accept all clusters above 85%" bulk action to knock out the obvious ones in one click, then work through the 65–85 band by hand.

  7. Export the cleaned list

    Advance to Export. You have two useful output shapes:

    • Cleaned list — one row per unique entity, with merged values. Drop this straight back into your system.
    • Decision log — every original row with a _lmg_cluster_id, _lmg_review_decision, and _lmg_master_row_id pointing to the surviving row. Use this when you want to apply dedupe decisions in your CRM without letting ListMatchGenie do the write.

    XLSX gives you both in one workbook. CSV gives you whichever you pick.

Applying the cleanup in your system

Once you have the export, you have three common patterns:

Pattern 1: Replace the list

If the list isn't the source-of-truth (e.g. it was exported from somewhere as a snapshot), just re-import the cleaned version wherever it needs to go.

Pattern 2: Merge in-system

If the list is from a CRM and you want to keep the CRM IDs, use the decision log. For each cluster:

  1. Identify the winner row (_lmg_review_decision = approved).
  2. In your CRM, merge the loser records into the winner (most CRMs have a "merge contacts" function).
  3. Use the _lmg_master_row_id on losers to find the winner's CRM ID.

Pattern 3: Mark and re-review

If you want a second set of eyes before merging, export with the decisions and share it as a spreadsheet. _lmg_review_decision is a human-readable column.

Common gotchas

  • Family members at the same address. The engine may cluster them. Most common signal to disambiguate is email or phone — if those differ, keep separate.
  • Legitimate rebrands. Same company, different name. These are fuzzy duplicates but should stay separate if they represent different time periods.
  • Copied-then-edited rows. Someone duplicated a row to create a new contact at the same company, then forgot to update half the fields. These should merge only if you can confirm they represent the same person.