-
Notifications
You must be signed in to change notification settings - Fork 418
MSC4382: Peppered hash verification for E2EE content moderation #4382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
seuros
wants to merge
7
commits into
matrix-org:main
Choose a base branch
from
seuros:peppered-hash-verification
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+168
−0
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
cacd4ef
Add MSC: Peppered hash verification for E2EE content moderation
seuros 0b3cd39
Update MSC number to 4382
seuros 8fc3dc7
Replace #### placeholders with MSC4382
seuros c51c711
Update unstable prefix to MSC4382
seuros 939e874
Fix MSC4382 based on review feedback
seuros b99e6ee
Add report endpoint extension and clarify plaintext structure
seuros 8a835ab
Add unstable prefix for report plaintext field
seuros File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| # MSC4382: Peppered Hash Verification for E2EE Content Moderation | ||
|
|
||
| When users report problematic content in encrypted rooms, server | ||
| administrators have no way to verify that the reported plaintext matches | ||
| what was actually sent. Reporters can claim an encrypted message contains | ||
| policy violations when it does not, enabling false reporting and | ||
| coordinated brigading attacks. | ||
|
|
||
| This proposal adds a `verification_hash` field to encrypted events that | ||
| allows servers to cryptographically verify reported content without | ||
| requiring decryption keys or breaking E2EE privacy guarantees. | ||
|
|
||
| ## Proposal | ||
|
|
||
| Add a `verification_hash` field within the `content` of `m.room.encrypted` | ||
| events, constructed as follows: | ||
|
|
||
| ``` | ||
| verification_hash = base64(SHA-256(canonical_json(plaintext) || ciphertext)) | ||
| ``` | ||
|
|
||
| Where: | ||
| - `plaintext` is the unencrypted message body, encoded as canonical JSON | ||
| - `ciphertext` is the encrypted payload from `content.ciphertext` | ||
| - `||` represents concatenation | ||
| - The result is base64-encoded for JSON representation | ||
|
|
||
| ### Example Event | ||
|
|
||
| ```json | ||
| { | ||
| "type": "m.room.encrypted", | ||
| "sender": "@alice:example.org", | ||
| "content": { | ||
| "algorithm": "m.megolm.v1.aes-sha2", | ||
| "ciphertext": "AwgAEtAB...", | ||
| "session_id": "SESSION_ID", | ||
| "verification_hash": "+djHtqXk08IwN5cQ..." | ||
| }, | ||
| "signatures": { ... } | ||
| } | ||
| ``` | ||
|
|
||
| ### Report Endpoint Extension | ||
|
|
||
| The content reporting endpoint is extended to include the plaintext event: | ||
|
|
||
| ``` | ||
| POST /_matrix/client/v3/rooms/{roomId}/report/{eventId} | ||
| { | ||
| "reason": "Human-readable explanation", | ||
| "plaintext": { | ||
| "type": "m.room.message", | ||
| "content": {"msgtype": "m.text", "body": "..."}, | ||
| "room_id": "!room:server" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The `plaintext` field contains the full plaintext event structure that | ||
| was fed into the encryption algorithm, as specified in the Megolm | ||
| documentation. This is the same structure that clients decrypt when | ||
| receiving encrypted events. | ||
|
|
||
| ### Verification Process | ||
|
|
||
| When a user reports encrypted content, the server verifies: | ||
|
|
||
| ```python | ||
| # Server receives report with event_id and claimed plaintext | ||
| report = receive_report(room_id, event_id) | ||
|
|
||
| # Fetch stored encrypted event from database (local operation) | ||
| event = database.get_event(event_id) | ||
|
|
||
| # Extract verification data (all already stored locally) | ||
| ciphertext = event['content']['ciphertext'] | ||
| verification_hash = event['content']['verification_hash'] | ||
|
|
||
| # Compute hash of reported plaintext with stored ciphertext | ||
| plaintext_event = canonical_json(report['plaintext']) | ||
| computed = base64(sha256(plaintext_event + ciphertext)) | ||
|
|
||
| # Single comparison - no decryption, no network requests | ||
| if computed == verification_hash: | ||
| # Report verified - plaintext is authentic | ||
| else: | ||
| # Report is false - reporter is lying | ||
| ``` | ||
|
|
||
| The server never needs decryption keys or access to the encryption | ||
| session. All verification data (ciphertext and verification_hash) is | ||
| already stored in the database. Verification requires only a local | ||
| database fetch and a single SHA-256 computation. | ||
|
|
||
| ### Security Properties | ||
|
|
||
| **Prevents rainbow table attacks**: Each message has a unique pepper | ||
| (its ciphertext), preventing pre-computation of common phrases. An | ||
| attacker would need to generate a new rainbow table for every single | ||
| message, making mass surveillance infeasible. | ||
|
|
||
| **Prevents pattern matching**: Same plaintext produces different hashes | ||
| in different messages since each has unique ciphertext. | ||
|
|
||
| **Allows targeted verification**: Servers can verify suspected content | ||
| in specific messages when they have reason to investigate (the intended | ||
| moderation use case). | ||
|
|
||
| **Does not enable mass surveillance**: Checking if a keyword exists | ||
| requires testing it against every individual message's unique pepper, | ||
| which does not scale. | ||
|
|
||
| ## Potential Issues | ||
|
|
||
| **Backwards compatibility**: Existing clients don't send | ||
| `verification_hash`. The field should be optional during transition, | ||
| with servers falling back to social trust when absent. During the | ||
| unstable period, use `org.matrix.msc4382.verification_hash`. | ||
|
|
||
| **Event size increase**: Adds approximately 43 bytes for base64-encoded | ||
| SHA-256 output, plus field name and JSON delimiters (~60 bytes total). | ||
| For typical messages this represents ~10-15% overhead, which is | ||
| considered acceptable for the moderation benefit provided. | ||
|
|
||
| **Targeted attacks possible**: If a server suspects specific content in | ||
| a specific message, it can verify that suspicion. This is by design - | ||
| it's the intended moderation use case and requires prior knowledge about | ||
| which messages to check. | ||
|
|
||
| **Server could cache plaintexts**: Malicious servers could cache | ||
| reported plaintext-hash pairs. However, servers can already cache | ||
| reported content in the current system, so this proposal doesn't make | ||
| the situation worse. | ||
|
|
||
| ## Security Considerations | ||
|
|
||
| The hash uses SHA-256, which provides 2^128 collision resistance. The | ||
| probability of accidental collision is negligible, and malicious | ||
| collisions are infeasible with current technology. | ||
|
|
||
| The `verification_hash` is included in the signed `content` object, so | ||
| it cannot be forged without the sender's signing key (Ed25519). | ||
|
|
||
| This proposal trades perfect unlinkability for practical moderation | ||
| capability. It prevents mass surveillance and pattern matching while | ||
| enabling verification of specific reported messages. Servers can verify | ||
| suspected content in targeted messages (similar to lawful intercept | ||
| capabilities), but cannot scan all messages for keywords at scale. | ||
|
|
||
| The threat model assumes honest-but-curious or malicious servers. A | ||
| malicious server could attempt targeted verification of high-value | ||
| messages, but this requires prior suspicion about specific messages and | ||
| cannot be done at scale. This is comparable to existing lawful intercept | ||
| capabilities. | ||
|
|
||
| ## Unstable Prefix | ||
|
|
||
| During development, implementations should use: | ||
| - Field name: `org.matrix.msc4382.verification_hash` | ||
| - Report field: `org.matrix.msc4382.plaintext` | ||
| - Client capability: `org.matrix.msc4382.peppered_hash_verification` | ||
|
|
||
| ## Dependencies | ||
|
|
||
| This MSC has no dependencies on other proposals. It builds on the | ||
| existing E2EE implementation (Megolm/Olm), event signing (Ed25519), and | ||
| content reporting mechanisms. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements: