ListMatchGenie

Setting the confidence threshold

The confidence threshold is the most-tuned dial in ListMatchGenie. Here's what moving it does, what bands to use for what situations, and how to iterate without breaking your data.

The confidence threshold is a single number (default 70) that decides the cut between match and review. Raising it makes the engine stricter (fewer matches, more review cases). Lowering it makes the engine more permissive (more matches, lower precision).

The review threshold (default 55, always 15 points lower than the match threshold) is the cut between review and unmatched. You can tune the two independently in custom profiles, but by default they move together.

What moving the threshold actually does

Raising the threshold (e.g. 70 → 85)

  • Fewer source rows classified as match
  • More classified as review
  • Same number classified as unmatched (since the review threshold also raised)

The headline number "match rate" goes down. But your precision goes up — what remains in match is very high-confidence.

Use when: false positives are expensive (e.g. merging CRM records, compliance workflows, billing).

Lowering the threshold (e.g. 70 → 60)

  • More source rows classified as match
  • Fewer as review (because the review band narrows at the top)
  • Fewer as unmatched (review threshold also drops)

Match rate goes up. Precision goes down — some matches will be wrong.

Use when: you need to catch everything and manually filter (research, discovery, M&A overlap analysis).

ScenarioThresholdRationale
First-time match, unknown data70Default, balanced
Dedupe customer list65Within-file dupes tend to be cleaner
Match leads to CRM70Standard B2C/B2B
Match against NPI registry80Regulated, false positives expensive
Email suppression list check85Compliance — false negatives ok, false positives not
M&A overlap discovery60Catch everything, filter later
Finance record merge90Precision trumps recall
International person matching65Spelling variation needs headroom

The iteration workflow

Don't try to pick the perfect threshold in advance. Iterate:

  1. Run at the default (70)

    Don't overthink the first run. Match, look at results, note the shape.

  2. Look at the score distribution chart

    On the job detail page, the score distribution histogram shows you what's happening at every score band. You're looking for:

    • A clear top peak (genuine matches, usually 85+)
    • A clear bottom cluster (genuine non-matches, near 0)
    • The middle — how smeared it is, where the density dies off

    The threshold should sit in the low-density part of the middle. If your distribution has a clear gap at 75–80, put the threshold there. If the middle is smooth, threshold choice is genuinely a trade-off.

  3. Spot-check the review queue

    Look at 10 cases each at the score bands around your threshold:

    • Just above (if threshold is 70, look at cases scoring 70–75)
    • Just below (65–69)

    If the "above" cases look clearly right, your threshold is fine or could be lower. If the "below" cases look clearly right, your threshold should be lower. If the "above" cases include obvious wrong ones, raise.

  4. Adjust by 5, re-run

    Change the threshold in 5-point increments. Don't micro-tune — a move of 2 points rarely produces visibly different output.

    Re-running is fast (the match engine re-classifies without re-scoring, so it completes in seconds).

  5. Save the tuned setting

    Once you're happy, save the profile under a name. Future runs of this workflow inherit your tuning — you never have to re-discover it.

Moving the review threshold independently

In custom profiles you can move the match and review thresholds independently. Common useful configurations:

Narrow review band

Match = 75, review = 70. Only cases very close to the match threshold go to review; rest are unmatched. Use when your review capacity is limited and you'd rather miss marginal cases than review them.

Wide review band

Match = 75, review = 50. Many borderline cases go to review. Use when review capacity is high (e.g. a team of analysts) and catching every possible match matters more than minimizing review workload.

Match threshold very high, review threshold normal

Match = 90, review = 55. Only the most obvious matches auto-accept; anything else is reviewed. Use for compliance, billing, or other zero-tolerance workflows.

Symptoms of a wrong threshold

Too high (stricter than your data warrants)

  • Low match rate compared to what you expected
  • Review queue is huge with many clearly-right cases
  • unmatched pile contains rows you can tell are in the master by inspection

Lower the threshold by 5–10 points.

Too low (more permissive than your data warrants)

  • Match rate suspiciously high
  • Sample matches contain obvious wrong cases
  • Review queue is small but contains mostly garbage

Raise the threshold by 5–10 points.

Right threshold, wrong profile

  • Match rate is low but you can tell the Genie is comparing on the wrong fields (e.g. weighting address when your data is person-level)

Changing threshold won't help. Revisit profile choice.