ListMatchGenie

Supported file formats

Every file format ListMatchGenie accepts, what it does with each, and the specific rules that apply to uploads.

ListMatchGenie accepts the five file formats customers actually have — CSV, TSV, XLSX, XLS, and pipe-delimited. Everything else is rejected with a clear error at upload time.

Accepted formats

CSV.csv

Standard comma-separated values. UTF-8 preferred; other encodings auto-detected and converted. First row must be a header row.

TSV.tsv, .tab

Tab-separated values. Same treatment as CSV — header row required, UTF-8 preferred.

Pipe-delimited.psv, .txt with pipe separator

Pipe-separated values (| as delimiter). Detected automatically by the first-row separator analysis. Useful for data exported from legacy systems where commas and tabs appear in values.

XLSX.xlsx

Excel 2007+ format. Multi-sheet workbooks supported — a sheet selector appears on upload. Formulas are evaluated at upload time and replaced with their computed values.

XLS.xls

Legacy Excel format (pre-2007). Same handling as XLSX but slower on large files. Multi-sheet workbooks supported.

File shape requirements

  • Header row required. The first row must contain column names. An empty or missing header row causes rejection.
  • Minimum 2 columns, 1+ data row. A single-column file can't be matched (no identity to compare).
  • Maximum columns is tier-gated. See File size and row limits.
  • Rows must have consistent column counts. Rows with more or fewer fields than the header row are flagged and (by default) dropped with a warning in the cleansing report.

Rejected formats

Upload is refused for:

  • PDF, Word, PowerPoint — not tabular data
  • Archives (ZIP, RAR, TAR, 7z) — extract first and upload the contents
  • Executables and scripts (EXE, BAT, SH, JS) — security; also not data
  • Macro-enabled Excel (.xlsm, .xlsb) — security (macros can execute malicious code). Save as .xlsx to strip macros
  • Images and video — not data
  • Files above the tier size cap — upgrade or split

Rejections happen at upload time, before anything is stored. You see a clear error explaining which rule the file violated.

Encoding

All text is normalized to UTF-8 internally. On upload:

  1. The file's byte signature is examined for a BOM (byte-order mark) — if present, the encoding is read from it.
  2. If no BOM, statistical encoding detection is used to guess the encoding from content.
  3. Non-UTF-8 content is converted. Unknown sequences are replaced with the Unicode replacement character U+FFFD and flagged in the cleansing report.

The supported source encodings are: UTF-8, UTF-8-BOM, UTF-16, Latin-1 (ISO-8859-1), Windows-1252 (CP1252), Shift-JIS, GB2312, and Big5. See Encoding and characters for detail on how this handles international data.

Delimiter detection (for CSV/TSV/pipe)

The first 100 rows are sampled to detect the delimiter. Candidates are comma, tab, pipe, and semicolon. The delimiter that produces consistent column counts across rows wins.

If delimiter detection is ambiguous, you can override it on upload via the Delimiter dropdown.

Quoting and escaping

CSV quoting follows RFC 4180:

  • Fields containing the delimiter, newlines, or double-quotes must be wrapped in double-quotes.
  • Literal double-quotes inside quoted fields are escaped by doubling ("").

Non-RFC-compliant CSV (e.g. backslash-escaped quotes) is supported via best-effort parsing but may generate warnings in the cleansing report.

Sheet selection (XLSX/XLS)

When you upload a multi-sheet workbook:

  1. The app scans every sheet and presents a picker with row count and column count per sheet.
  2. Sheets with no data or only headers are flagged but selectable.
  3. Sheets with charts or pivot tables as their primary content are flagged (data will be the source values, not the derived view).

Only one sheet can be used as the source for a match. If you need multiple sheets, export each as a separate file.

Special characters in column names

Column names are preserved exactly — including spaces, punctuation, and non-ASCII characters. This matters for downstream tools that join by column name.

Caveats:

  • Column names are trimmed of leading/trailing whitespace on upload (internal whitespace preserved).
  • Duplicate column names are suffixed (Name, Name_2, Name_3) so every column is uniquely addressable.
  • Very long column names (>100 chars) are truncated with a warning.

Formulas in XLSX

Spreadsheet formulas are evaluated to their current computed value at upload time. The formula itself is not preserved — only the result. This means:

  • Volatile formulas (NOW(), TODAY()) give you whatever value they had at the moment of save.
  • Formulas that reference other sheets work, but you only get the value, not the reference.
  • Errors (#N/A, #VALUE!, #REF!) become empty cells with a warning.

If you need formulas to be re-evaluable, the workflow is out-of-scope for ListMatchGenie — we're a matching tool, not a spreadsheet engine.