ListMatchGenie

Match two CSV files

Match two CSV files — fuzzy join without writing code.

You exported two CSVs from two different systems. Now you need to know which rows correspond. The join keys are the same data but the formatting drifted — names spelled differently, emails with aliases, phones formatted six ways. Drag both files in. The Genie does the fuzzy join and exports a single CSV with both sides' columns and a match status per row.

The problem

The two CSVs aren't perfectly aligned. Standard joins miss most matches.

  • Two CRM exports — same contacts, but the export from System A wrote 'Sarah Patel' and System B wrote 'sarah patel'. Standard CSV merge misses it.

  • Two ERP exports of the same vendors, one with 'LLC' suffixes and one without. SQL JOIN drops half the rows.

  • Email addresses with aliases — 'sarah.patel@globex.com' in one file, 'spatel@globex.com' in the other. Same person, different join key.

  • Doing this in Python (pandas merge + fuzzywuzzy) requires writing code, debugging the fuzzy threshold, handling Unicode edge cases, and re-doing it next quarter.

  • Doing it in Excel with the Fuzzy Lookup add-in works on Windows only and breaks past ~50K rows.

  • Doing it in SQL requires loading both files into a database, writing fuzzy-match SQL with custom UDFs (or using Postgres trigram extensions), and a DBA's afternoon.

How the Genie solves it

Two CSVs in. One merged CSV out. Five-minute workflow.

Drag-and-drop CSV-to-CSV match

Source CSV + master CSV. The Genie auto-detects the schema on each, suggests which columns to join on, and lets you tune the match profile (which fields, how heavily weighted) before running.

Fuzzy matching on multiple fields together

Match on name + email + company combined. Calibrated weighted evidence — agreement on three fields beats disagreement on one. Nicknames, aliases, format variants, and company suffix variations all handled.

Review queue with confidence + 'why' per cluster

Every match comes back with a confidence score and an explanation of what aligned and what didn't. Cluster by pattern, bulk-accept high-confidence groups, eyeball the borderline cases.

Output: one merged CSV with both sides' columns

Export the matched file with all columns from both inputs side-by-side, plus match status (matched / review / unmatched), confidence, and master ID per row. Open in Excel, load to your DB, push to BI — same way you'd consume any CSV.

Save the match profile for re-runs

Configure once — match by email + name with company as a tiebreaker, fuzzy thresholds set — save as a profile. Next quarter's pair of files: drag, click, export.

No code, no Python, no DB

Browser workflow. No fuzzywuzzy / rapidfuzz / recordlinkage Python install, no Postgres trigram setup, no Apps Script. Useful when the analyst doesn't want to code or the engineer doesn't want to maintain a one-off match script.

Real example

Matching a CRM export against an event-attendance export

Same workflow whether your two CSVs come from CRMs, marketing tools, ERPs, or custom systems.

Source file

crm_contacts.csv · firstname, lastname, email, phone, company

Master file

event_attendees.csv · first, last, work_email, mobile, organization

Sarah Patel · sarah.patel@globex.com · 415-555-0188 · Globex Inc

S. Patel · spatel@globex.com · 4155550188 · Globex

matched

Surname exact, name initial matches, email alias on shared domain, phone same digits different format, company variant — composite 0.94

Bob Tan · bob.tan@initech.co · 617-555-1234 · Initech

Robert Tan · robert.tan@initech.co · — · Initech Inc

matched

Nickname pair, email alias, company variant — confidence 0.91 even with missing phone in master

Mike Johnson · m.johnson@acme.org · 312-555-7700 · ACME Foundation

(no match)

unmatched

No similar contact in event attendance file — registered for the event would have surfaced

Before and after

What changes when you use ListMatchGenie

Without ListMatchGenie

  • Write a Python script with pandas + fuzzywuzzy (or rapidfuzz). Tune the threshold. Handle Unicode, empty strings, missing fields. Debug the false positives. Re-run when next quarter's data arrives.
  • Or: load both CSVs into Postgres with the trigram extension; write a JOIN ... ON similarity(...) > 0.7 query; explain what 0.7 means to your stakeholder.
  • Or: install Excel's Fuzzy Lookup add-in on a Windows machine; hope your CSV fits under 50K rows.
  • In every case, no review queue, no confidence-per-cluster, no 'why this matched' explanation. You get matches and the burden of trusting them.

With ListMatchGenie

  • Drag both CSVs in. The Genie profiles each, suggests join columns, and lets you tune the match profile.
  • Run the match. Get a clustered review queue with confidence scores and per-cluster explanations.
  • Bulk-accept high-confidence patterns; eyeball the borderline ones.
  • Export the merged CSV with both sides' columns + match metadata. Total time: 5–15 minutes for a typical mid-size match.

FAQ

Questions about match two csv files

See all use cases

Let the Genie handle the grunt work.

Free tier is real. No card. No forms. Just upload your first list and see the Genie clean and match it in under a minute.