The Upload step is where every match begins. You pick the two files to compare (or one if you're running a dedupe or cleanse-only), and the Genie profiles every column so downstream steps know what it's working with.
The interface
Two drop zones, side by side:
- Source — the list you're looking things up against
- Master — the canonical list you're matching into
Each zone accepts drag-and-drop or click-to-browse. For master files, a Use saved master dropdown lets you pick from files saved on the Files page.
What happens when you upload
As soon as a file lands in either zone:
- The file is validated for format, size, and row count
- Encoding is detected and converted to UTF-8 if needed
- The first few hundred rows are sampled for column profiling
- Every column gets a detected type, null rate, and sample values displayed inline
This all happens in under a couple seconds for small files, up to about 10 seconds for 500k-row files.
The column profile
Below each file you'll see a column-by-column breakdown:
Detected typeemail, phone, date, currency, identifier, name, address component, free text, etc. Drives cleansing and matching decisions downstream.
Null %Percentage of rows with a missing or empty value in this column. High null % on an identity column (e.g. 40% of rows missing email) is worth noticing before you configure matching.
Distinct valuesNumber of unique values. A low count suggests a categorical column; an unexpectedly high count can reveal casing or formatting variation.
Sample valuesThe five most common values, so you can sanity-check detection at a glance.
Overriding detected types
If a column is detected wrong (a common case: ZIP codes detected as number, stripping leading zeros), click the column's type badge and pick the correct type from the dropdown. Overrides persist for the current match and can optionally be saved to the file's metadata so future matches inherit them.
Multi-sheet Excel
If you upload an XLSX with multiple sheets, a Sheet selector appears above the drop zone. Pick which sheet is the data (skipping cover sheets, pivot tables, etc.).
Tier limits
Upload size and row count are tier-gated. See File size and row limits for the full table. Hitting a limit shows a clear error with an upgrade link — no silent truncation.
Advancing
The Next button unlocks when both files (or source-only, for cleanse and dedupe) have been uploaded and profiled. Clicking advances to Cleanse.
Related reading
- Supported file formats
- Column profile — what the profile feeds into
- Files — saved master management
