If your customer list includes EU residents, the tool you use to match or dedupe it is a data processor under GDPR. That means vendor choice matters — not just for the features, but for the audit trail and the legal exposure.
This guide covers what GDPR actually requires of matching tools, what to look for when evaluating a vendor, and the architectural patterns that make ongoing compliance easier.
What GDPR requires of a matching tool
The matching engine itself doesn't store your data forever — it loads, processes, produces a result, and typically discards the working copy. But "typically" isn't good enough for GDPR. Specifically:
- Lawful basis and purpose limitation. Data matching has to serve a documented purpose. "Clean up our CRM" is usually legitimate interest; "build a profile of every customer's spouse" is not.
- Data minimization. Upload the fields you need to match on, not every column. If you're matching on email + name, you don't need to upload salary.
- Storage limitation and regional residency. EU resident data should stay in the EU (or go through an approved transfer mechanism). If you're matching Frankfurt-office customer data on a US-hosted tool, that's a cross-border transfer that needs to be legally supported.
- Right to erasure. When a customer requests deletion, the vendor needs a documented path to remove the data from every place it was processed — including any intermediate working copies.
- Processor-to-processor transparency. If the matching vendor uses sub-processors (cloud provider, AI service), you need to know who they are and what data goes to them.
What to look for when evaluating a vendor
1. Regional data residency as a first-class feature
The cleanest GDPR posture is a tool that lets you choose where your data lives. EU data in Frankfurt (eu-central-1), UK data in London (eu-west-2), US data in the US — and data never crosses those boundaries. Ask the vendor: "If I upload a file for an EU customer, where does it go, and does any part of processing happen outside the EU?"
2. No PII in the metadata layer
Good architecture separates control-plane metadata (file IDs, match job configs, aggregate stats) from the actual data plane (the file contents). The metadata database can be multi-region and replicated; the data stays in regional storage. Check how the vendor handles this.
3. One-click delete that actually deletes
Account closure should remove all your data from all regions within a well-defined window (ideally 30 days per GDPR Art. 17). Ask the vendor for their deletion runbook — if they can't describe what happens step by step, that's a flag.
4. DPA and sub-processor list
The vendor should have a Data Processing Agreement ready to sign (not "we'll draft one for you") and a publicly-listed sub-processor list. Sub-processors typically include: cloud provider (AWS / GCP / Azure), maybe an AI/LLM vendor, maybe a transactional email service. You should be able to see who they are without asking.
5. Audit trail for every match
Every match job should leave a record: who ran it, which files, what the result was, when it completed. If a regulator or a customer asks "why was this person in your deduplicated CRM output on 2024-11-15?", you need to be able to answer.
Architectural patterns that make compliance easier
Some product architectures are inherently more GDPR-friendly than others. Worth understanding because it affects your implementation cost.
- Region-routed storage. Rather than a single global S3 bucket, separate buckets per region (US, EU, UK) with routing based on customer data residency. Eliminates the cross-border transfer question at the architecture level.
- PII-free central database. The database stores file IDs, match configurations, aggregate stats, user accounts — but zero customer data. The database can live anywhere; the actual PII stays in its region.
- Ephemeral worker processing. The match engine loads data from regional storage, processes in memory, writes the result back to storage, and discards the working copy. No long-lived copy of PII outside the regional data plane.
- AI isolation. If the tool uses AI (for summaries, narratives, etc.), the AI call should receive schema-level data only — column names, aggregate stats, sample patterns — never raw records. Look for explicit documentation of what goes to the LLM.
How ListMatchGenie handles it
ListMatchGenie implements all four of the patterns above. Your data lives in the S3 bucket for your region (us-east-1, eu-central-1, or eu-west-2). The PostgreSQL database has zero PII — only file references and aggregated statistics. The AI layer (AWS Bedrock) receives schema and stats only, never raw customer records. Account deletion removes all regional data within the platform's standard GDPR window.
The full DPA and sub-processor list are available on the security and DPA pages.
The short version
For GDPR compliance in data matching, the two questions that matter most when evaluating a vendor are:
- Where does my data live? Answer should be a specific region, and it should match where your customers are.
- What happens on account deletion? Answer should be specific and verifiable, not "we comply with GDPR."
Vague answers on either question mean the vendor hasn't thought about this carefully. Pick one that has.

