Master vs source files

Every match in ListMatchGenie involves exactly two files: a source and a master. The direction matters — swapping them changes what the results mean.

Source file — the list you want looked up against something. Usually new, incoming, or messy data. Leads from a trade show, rows from a supplier export, contacts pasted from a partner.
Master file — your reference truth. Usually established, cleaned, and curated data you want to enrich or deduplicate against. Your CRM, your account database, the NPI registry, an official product catalog.

The match engine takes every row in the source and tries to find the best corresponding row in the master. You get back source rows enriched with the matched master-row data.

Why the direction matters

Consider two scenarios with the same two files — a 500-row new-lead list and a 50,000-row CRM export — run in opposite directions.

Leads as source, CRM as master: you get every lead enriched with the CRM record it matches (or flagged as a new prospect). This is almost always what you want. Output is 500 rows.

CRM as source, leads as master: you get every CRM contact enriched with the lead it matches (or flagged as not-in-leads). Output is 50,000 rows — and almost all of them will be unmatched because your CRM is much bigger than your lead list. Not useful.

Rule of thumb: source is the thing you're curious about, master is what you already know well.

How the app treats each file

Master files are treated as reusable assets:

They're saved to your Master Files page so you can re-use them across matches.
They get versioned — re-uploading a master file creates a new version, with a 30-day undo window.
Tier limits cap how many distinct masters you can keep saved at once.

Source files are treated as transient:

Saved automatically for 30 days so you can re-run a match against a new master without re-uploading.
Don't count against your "saved masters" tier limit.
Get deleted automatically after the retention window.

See Files for how to manage both.

One master, many sources

A well-built master pays dividends. Once you've uploaded and cleaned your CRM as a master, every incoming lead list can match against it in seconds — no re-upload, no re-cleanse. The typical usage pattern is:

Build a clean master once (customers, accounts, providers, products).
Match every new source file against that master.
Periodically refresh the master as it changes upstream.

ListMatchGenie is optimized for this pattern. Master-file storage is cheap, source-file matching is fast, and tier limits let you keep multiple masters in parallel (e.g. one for customers, one for suppliers).

Can a file be both?

Yes, in two specific cases:

Contact dedupe

When you want to find duplicates inside a single list, both roles are played by the same file. Upload once, pick the Contact dedupe profile, and the engine matches the file against itself (skipping self-matches). Output groups probable duplicates into clusters via the _lmg_cluster_id column.

Cross-match

When you want to find overlap between two equal-weight lists (M&A, email suppression, territory planning), the direction is arbitrary. Either file can be source. In this case your output size equals the source file's size — but you can run the match in both directions if you want a symmetric view.

Common mistakes to avoid

Using a dirty file as master. The master is a reference, so its quality sets a ceiling on your match quality. Clean the master first (run a dedupe pass on it alone) before using it for other matches.
Swapping direction to "match more". If you're getting low match rates, the problem is almost never the source/master direction. It's usually data quality, profile choice, or threshold. See Troubleshooting.
Uploading the master fresh every time. If the file hasn't changed, re-use the saved master — it's faster and you don't burn upload quota. The app flags files that look like an older version of a saved master and offers to update in place.

Keep master files narrow

A master file only needs the columns you'll actually match against and any columns you want to pull through to the source. Trimming from 60 columns to 12 dramatically speeds up matching and makes the results easier to read.

Files — managing saved master files
Three-stage pipeline — where source and master fit in the flow
Find overlap between two lists — the cross-match pattern