Find overlap between two lists

Sometimes you don't have a "source" and "master" — you have two lists of equal weight and you want to know which records appear in both. Mergers and acquisitions. Email suppression against a do-not-contact list. Territory planning where you need to know which accounts two teams both own. Attribution reconciliation.

This guide walks you through finding and exporting the overlap between two arbitrary lists.

When to use this guide

Use this workflow when:

You have two lists of roughly equal importance (neither is canonical truth).
You want to know: which records are in both, which are only in list A, and which are only in list B.
You may want the result as a single file with a provenance column, or as three separate files.

If one list is a clear "master" you want to enrich against, use Match leads to CRM instead — the direction matters there.

The three-way classification

Every record in either list ends up in one of three buckets:

In both — record exists in list A and list B
Only in A — record exists in list A but not list B
Only in B — record exists in list B but not list A

ListMatchGenie classifies records via a match run in one direction, which handles "in both" and "only in A". To get "only in B" you run the match a second time in the reverse direction. This guide covers both.

Before you start

Both files should have the same identity columns (or mappable equivalents). You can't find overlap if one list has emails and the other only has phone numbers.
Clean each list individually first — duplicates inside either list will produce spurious "possible match" results.
Think about how strict you want "overlap" to mean. Exact email match only? Fuzzy name + address? This drives your match profile and threshold choice.

The workflow

Run A against B
Upload list A as the source, list B as the master. Pick a profile that matches your identity columns:
- Both have people → Person
- Both have companies → Company
- Both have emails/IDs → Identifier (with email as the identifier column)
Set the confidence threshold based on how strict you want "overlap" to be. For M&A overlap where you want to err on the side of catching everything, use 60. For compliance-grade email suppression, use 85.
Run the match.
Interpret the first pass
You now know, for each row in list A:
- match or review approved → in both
- unmatched or review rejected → only in A
But you don't yet know which rows in list B are not in list A — your output only contains list A rows.
Run B against A (reverse pass)
Start a second match. Upload list B as the source and list A as the master.
Use the same profile and threshold you used in the first pass — consistency matters so the overlap is symmetric.
Run the match. This output tells you, for each row in list B:
- match/review approved → in both (you already know this from pass 1, but it's a consistency check)
- unmatched/review rejected → only in B
Combine the outputs
Download both exports. You now have everything you need:
- From pass 1, filter to _lmg_match_status = match (or approved reviews) → the "in both" set, written from list A's perspective with list B's columns enriched.
- From pass 1, filter to _lmg_match_status = unmatched → "only in A".
- From pass 2, filter to _lmg_match_status = unmatched → "only in B".
Stack them into one file with a provenance column (in_both, only_in_a, only_in_b) if you want a single sheet.
The "in both" set is the authoritative overlap list. The two "only in" sets are the non-overlap parts of each list.
Sanity-check the counts
Some quick arithmetic:
```
pass_1_matched + pass_1_unmatched = rows in list A
pass_2_matched + pass_2_unmatched = rows in list B
pass_1_matched ≈ pass_2_matched (should be within a few rows, not identical)
```
If pass_1_matched and pass_2_matched differ by more than a few percent, your threshold is too loose — one side is catching matches the other isn't. Tighten and re-run.

Common use cases

M&A customer overlap

During due diligence you want to know: of the target company's customers, how many are already ours?

Target customer list = A, your customer list = B
Run A → B at threshold 65 (looser, because customer data is often inconsistent)
"In both" is the overlap: customers you both already serve
"Only in A" is the acquisition upside: target's customers you'd newly gain

Email suppression

You're sending a campaign and need to suppress anyone on a do-not-contact list.

Campaign list = A, suppression list = B
Run A → B at threshold 85 (strict — false positives here are good, false negatives are a compliance problem)
"Only in A" is your safe-to-send list
Ignore "only in B" and "in both" entirely

Territory reconciliation

Two sales teams both claim certain accounts. Which ones overlap?

Team 1 account list = A, Team 2 account list = B
Run A → B with Company profile at threshold 70
"In both" is the territory conflict list
Discuss each case at your next ops sync

One-pass shortcut with a union file

For small lists, you can do overlap in a single pass: concatenate both lists with a provenance column, then run a contact dedupe pass. Rows that cluster together came from both files. Works up to a few thousand rows; at scale, two-pass is cleaner.

Confidence scores — the two-threshold system that makes overlap-finding tunable
Match profiles — picking the right profile for your identity columns
Deduplicate a customer list — the one-pass shortcut via dedup

Run A against B

Interpret the first pass

Run B against A (reverse pass)

Combine the outputs

Sanity-check the counts