The Cleanse step shows you exactly what the Genie is about to change in your data before matching runs. Read the summary, scan the issue list, override anything you disagree with, and advance.
Cleansing is never silent. If something changes, it's on this screen.
What you see
Narrative summary
At the top, the Genie's paragraph-long summary of both files. Describes data quality, the biggest fixes, and anything that might warrant attention. Example:
"Your source file has 4,812 rows across 11 columns; data quality is 94%. The Genie will standardize 1,840 phone numbers, expand 47 state abbreviations, and remove 23 exact duplicates. Your master file has 142,000 rows with 99% quality — no significant issues."
Per-file reports
Tabs for Source and Master, each showing:
- Issue list — every detected issue with count, type, and action (auto-fix, merge, flag)
- Column profile — type detection and sample values (same as Upload step, for reference)
- Dedup report — exact, near-exact, and fuzzy duplicate counts
See Cleansing report and Dedup report for the full structure of each.
Overriding cleansing behavior
Every rule the Genie plans to apply is listed with a toggle. You can:
- Disable a rule for a specific column — e.g. skip phone normalization for a notes column that happens to contain phone-like strings.
- Change a default — e.g. flip date interpretation from US to European on a specific column.
- Adjust duplicate thresholds — raise the fuzzy-duplicate threshold to flag fewer near-dupes, or lower it to flag more.
Overrides apply to the current match only. To persist across matches, save them to the file's metadata (via the Files page) or save a custom match profile.
What cleansing does not do
Cleansing never deletes columns, never renames columns, and never reorders rows. Your schema is preserved. Duplicates are the one exception — exact and near-exact duplicates are removed by default, but both can be toggled off.
Advancing
The Next button advances to Configure. Cleansing rules are applied at this moment — all transformations happen in a single pass as the match is queued.
Cleansing is reversible
Cleansing results are stored alongside the raw file, not on top of it. If you're unhappy with the cleansed output, you can re-run the match with different rules or inspect the raw file. Exports always preserve the original values alongside the cleansed ones.
Related reading
- Cleansing report — report structure detail
- Dedup report — duplicate-detection detail
- Three-stage pipeline — how cleansing feeds matching
