PII Masking Research: Context Retention Benchmarks

Question: How much reasoning capability do LLMs lose when you mask PII?
Answer: With generic redaction (<PERSON>), they lose ~90%. With semantic masking ({Name_hash}), they retain ~100%.

The Problem

You want to use LLMs on sensitive documents (HR files, support tickets, medical records). Compliance says you can't send raw PII. So you mask it.

But masking destroys context:

Original: "John's manager Sarah approved the request."
Masked:   "<PERSON>'s manager <PERSON> approved the request."

Now the LLM can't answer "Who approved the request?" — everyone is <PERSON>.

Our Finding: Semantic Masking Preserves Context

Replace entities with distinguishable placeholders:

Semantic: "{Name_a3f2}'s manager {Name_b7c9} approved the request."

The LLM answers {Name_b7c9}. We unmask it → Sarah. ✅

You can check out the repo here: https://github.com/Privalyse/privalyse-mask

Benchmark Results

1. Coreference Resolution (Entity Tracking)

Test: Can the LLM track "who did what" across a document with multiple people?

Strategy	Context Retention
Original (baseline)	100%
Generic Redaction (`<PERSON>`)	27%
Semantic Masking (`{Name_hash}`)	100%

Script: context_research/01_coreference_benchmark.py
Results: results/coref_benchmark.json

2. RAG Question Answering

Test: After retrieving a masked document, can the LLM answer relationship questions?

Strategy	Context Retention
Original (baseline)	100%
Generic Redaction	17%
Semantic Masking	92%

Script: context_research/02_rag_qa_benchmark.py
Results: results/rag_qa_benchmark.json

Quick Start

# Install
pip install privalyse-mask presidio-analyzer presidio-anonymizer openai

# Set API key (for LLM evaluation)
export OPENAI_API_KEY="sk-..."

# Run Coreference Benchmark
python context_research/01_coreference_benchmark.py

# Run RAG QA Benchmark
python context_research/02_rag_qa_benchmark.py

Reproducibility

Seed: 42 (all randomness is seeded)
Data: 100% synthetic (no real PII)
Evaluator: GPT-4o-mini (temperature=0)
Embedding: text-embedding-3-small

Repository Structure

privalyse-research/
├── README.md                      # This file
├── context_research/
│   ├── 01_coreference_benchmark.py   # Entity tracking test
│   └── 02_rag_qa_benchmark.py        # RAG QA test
├── results/
│   └── rag_qa_benchmark.json      # Latest results
└── _archive/                      # Old experiments (for reference)

Key Insight

The LLM doesn't need to know WHO the person is.
It just needs to know that Person A ≠ Person B.

Semantic placeholders preserve the relationship graph while removing the actual identities.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
context_research		context_research
results		results
.gitignore		.gitignore
BENCHMARK_REPORT.md		BENCHMARK_REPORT.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PII Masking Research: Context Retention Benchmarks

The Problem

Our Finding: Semantic Masking Preserves Context

Benchmark Results

1. Coreference Resolution (Entity Tracking)

2. RAG Question Answering

Quick Start

Reproducibility

Repository Structure

Key Insight

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PII Masking Research: Context Retention Benchmarks

The Problem

Our Finding: Semantic Masking Preserves Context

Benchmark Results

1. Coreference Resolution (Entity Tracking)

2. RAG Question Answering

Quick Start

Reproducibility

Repository Structure

Key Insight

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages