ListMatchGenie

International matching

Match records across 20 regions, one engine.

From US CRM exports to Spanish full-name fields like 'María Isabel García López Hernández', the Genie handles the quirks of each country's naming conventions, postal codes, and diacritics.

Supported regions

20 regions, validated end-to-end.

Each regional module ships with validated handling of naming conventions, particles, diacritics, and postal-code formats.

English-speaking

6
  • United States
  • United Kingdom
  • Ireland
  • Canada
  • Australia
  • New Zealand

English-language naming conventions, regional postal formats (US ZIP, UK postcode, Canadian postal code, Irish Eircode, Australian postcode, NZ postcode), and Québécois St-Pierre ↔ Saint-Pierre equivalence.

Western Europe (DACH + Benelux)

4
  • Germany
  • Austria
  • Switzerland
  • Netherlands

German umlaut folding (Müller ↔ Mueller, Schloß ↔ Schloss), Swiss multilingual data (German/French/Italian), and Dutch particle handling (van, van der, van den, de).

Southern Europe

4
  • France
  • Spain
  • Italy
  • Portugal

Spanish two-surname convention (García López), Portuguese compound surnames with particles (da Silva, dos Santos), French accent folding (é/è/ê/ë → e), and Italian regional particles (di, della).

Nordic

3
  • Sweden
  • Norway
  • Denmark

Region-aware folding for Å/Ø/Æ — Åke ↔ Ake in Swedish, Søren ↔ Soren in Danish — and Nordic postal formats.

Eastern Europe

1
  • Poland

Polish diacritic folding (ł/ń/ś/ż/ć/ż/ź) and Polish postal format (NN-NNN).

Latin America

2
  • Mexico
  • Brazil

Mexican two-surname Spanish convention, Brazilian Portuguese particle matching (da Silva, dos Santos, de Oliveira), and local postal formats (CP, CEP).

Canonical list (20): US, UK, Ireland, Canada, Australia, New Zealand, Germany, Austria, Switzerland, Netherlands, France, Spain, Italy, Portugal, Sweden, Norway, Denmark, Poland, Mexico, Brazil.

Hard cases, handled

Real problems, not demo data.

A few of the specific patterns the engine handles that off-the-shelf fuzzy matchers get wrong.

Spanish full-name field with 4–5 tokens

A Spanish customer file often has a single column containing 'María Isabel García López Hernández'. The engine recognizes this pattern, identifies given names vs. paternal/maternal surnames, handles particles like 'de la' and 'del', and matches correctly against a master file with separate first_name and last_name columns — without treating 'López Hernández' as a foreign string.

Dutch particle 'van der' preserved and matched

'Johan van der Berg' in one file matches 'J. van der Berg' in another, and both match 'Van der Berg, Johan' when a sort-order convention differs. The particle chain ('van', 'van der', 'van den', 'de') is preserved in display and folded appropriately for the match key.

German umlaut folding per convention

Müller ↔ Mueller ↔ Muller — the engine applies German-specific folding (ä/ö/ü/ß → ae/oe/ue/ss) so all three records agree on the same match key. Display preserves the original character; matching operates on the folded form.

Québécois hyphenated saint-names

St-Pierre, Saint-Pierre, and St. Pierre all refer to the same Québécois surname. The engine normalizes saint-prefix abbreviations in French-Canadian data so these three variants land on the same match key without false positives against unrelated 'Pierre' records.

Brazilian Portuguese da/dos/de particles

'Ana Paula da Silva dos Santos' in one file matches 'Ana P. Da Silva Dos Santos' in another. The engine treats Portuguese connector particles as surname-chain elements rather than independent tokens, so particle casing and spacing variance doesn't fragment the match.

Scandinavian Å/Ø/Æ regional folding

Swedish, Norwegian, and Danish use overlapping but distinct folding conventions. The engine applies region-aware tables — Åke ↔ Ake in Swedish, Søren ↔ Soren in Danish, Ærø ↔ Aero in Danish geography — rather than a lossy one-size-fits-all ASCII strip.

Roadmap

What's not in today's release.

We're transparent about scope. The following are on the roadmap but aren't in the product yet — we'll ship them when we can do them as well as we handle the current 20 regions:

  • CJK (Chinese, Japanese, Korean) — name-order conventions, transliteration ambiguity, and script composition need dedicated product work.
  • Right-to-left scripts (Arabic, Hebrew, Persian) — bidi rendering, multiple accepted romanizations, and patronymic conventions.
  • Indic languages (Hindi, Bengali, Tamil, and others) — transliteration complexity and multi-script realities in typical data.
  • Thai and Vietnamese — tone marks, name-order, and syllable segmentation.
  • Finnish — specific linguistic structure; may add later if demand warrants.

FAQ

International matching questions

Let the Genie handle the grunt work.

Free tier is real. No card. No forms. Just upload your first list and see the Genie clean and match it in under a minute.