ListMatchGenie

The cleansing report

Before any match runs, the Genie cleanses your file and tells you exactly what changed. This is the cleansing report — your receipt for every transformation applied.

Every file uploaded to ListMatchGenie goes through a cleanse pass before matching runs. The cleansing report is the receipt — a structured record of every transformation the Genie applied to every column, every data-quality issue detected, and every rule you can override.

This page explains how to read it. The Cleanse step of the match wizard is where you see it in practice.

Why cleansing happens

Raw uploads are noisy. The same ZIP code appears as 01841, 1841, 01841-2100. The same phone number appears as 555-123-4567, 5551234567, +1 555 123 4567. Matching on raw values would fail those pairs on trivial formatting — and you'd spend hours in VLOOKUP hell trying to figure out why.

Cleansing standardizes these so matching can focus on what matters: whether the values actually represent the same thing.

The structure of the report

Every report has three sections: summary, column profile, and issue list.

Summary

A paragraph written by the Genie describing the overall data quality and the most impactful fixes. Example:

"Your file has 4,812 rows and 11 columns. Data quality is good overall — 94% of cells have clean values. The Genie standardized phone number formatting on 1,840 rows (all converted to digits-only), expanded 47 state abbreviations, and removed 23 exact duplicate rows. Two columns flagged for attention: email has 3% malformed addresses, and company has inconsistent capitalization across 412 rows."

Read this first. It surfaces the stuff you might actually care about.

Column profile

For every column, the report shows:

  • Detected type — email, phone, date, currency, identifier, free text, etc.
  • Null rate — what percent of rows have a missing or empty value
  • Distinct values — how many unique values, which is useful for spotting columns that should be categorical but have 4,000 variants from casing
  • Top values — the 5 most common values
  • Cleansing rules applied — the specific transformations (whitespace trim, casing, format normalization, etc.)

The column profile is what the match engine uses later to pick the right comparison method per column. A column profiled as email gets lowercased and exact-compared; a column profiled as first_name gets nickname-aware fuzzy comparison.

Issue list

A list of specific issues the Genie detected, each with a count and an action:

Stray whitespacefixed automatically

Leading/trailing spaces and internal runs of whitespace were collapsed. You don't need to do anything — matching is unaffected.

Inconsistent casingfixed automatically

Applied a consistent casing rule based on column type (lowercase for emails, title case for names, uppercase for state codes).

Malformed emailsflagged for review

Values that don't match basic email shape (e.g. joe@ or joe@.com). These are left as-is but flagged — you can decide whether to drop, fix, or leave them.

Mixed date formatsfixed automatically

Dates in MM/DD/YYYY, DD-MM-YYYY, and ISO formats were all normalized to ISO 8601 (YYYY-MM-DD). Ambiguous dates (e.g. 03/04/2026) default to US interpretation; override in column settings if your data is European.

Exact duplicate rowsfixed automatically

Rows where every column is identical to another row were removed. The dedup report has the details.

CSV-injection prefixesfixed automatically

Cells starting with =, +, -, or @ had a leading apostrophe added to prevent formula execution when opened in Excel. See CSV injection protection.

Short or malformed ZIP codesfixed automatically

US ZIPs under 5 digits were left-padded with zeros; ZIP+4 formats were split (or kept, based on column setting).

Unrecognized characters / encoding issuesfixed automatically

Non-UTF-8 content was decoded and converted; accented characters were preserved in the display column and transliterated in the match column (so García and Garcia match without losing the original).

Overriding cleansing behavior

Every rule the Genie applies is listed on the cleanse screen with a toggle. You can:

  • Disable a rule for a specific column — e.g. preserve leading zeros on a column the Genie flagged as numeric but is actually an ID.
  • Change the default — e.g. switch date format interpretation from US to European for one column.
  • Override column type — e.g. force the Genie to treat a column as identifier even if it looks like free text.

Overrides persist for the current match only. If you want persistent per-column rules (e.g. across every upload of your CRM), save them as part of a custom match profile.

Cleansing affects matching, not the original

Cleansing results are used by the match engine and appear in the display output, but your original column values are preserved in the export. You never lose data.

Where to find the report

  • Live — during the cleanse step of the match wizard, before the match runs
  • Persisted — on the job detail page for any completed match, under the "Cleansing" section
  • Exported — the summary text is included in XLSX exports as a dedicated sheet; the full issue list is available via the API for integrations