Full-history deduplication with encrypted content
Problem
teeclip currently deduplicates only against the most recent entry:
# history.py:181-186
last = conn.execute(
"SELECT hash FROM clips ORDER BY id DESC LIMIT 1"
).fetchone()
if last and last["hash"] == content_hash:
return None
This means if a user copies "hello", then "world", then "hello" again, three entries are stored — the duplicate "hello" is not caught because it's not consecutive. An idx_clips_hash index exists on the hash column but no query uses it.
For unencrypted content, full-history dedup is straightforward — WHERE hash = ? with the SHA-256 digest. But with encryption enabled, the hash column stores HMAC-SHA-256 keyed with the encryption key. This creates two complications:
-
Key rotation: If the encryption key changes (user runs --decrypt then --encrypt with a different key, or switches auth methods), all existing HMACs become incomparable to new ones. A WHERE hash = ? lookup would find no matches even for identical plaintext.
-
No way to verify without decryption: Without the key, there's no way to confirm whether two entries with different HMACs contain the same plaintext. With the key, HMAC comparison works — but only within a single key epoch.
The result is that encrypted history accumulates duplicates that plaintext history would not.
Proposed solution
Implement full-history deduplication that works correctly across both encrypted and unencrypted content, with awareness of key epochs.
Behavior for unencrypted clips (simple case):
SELECT id FROM clips WHERE hash = ? AND encrypted = 0 LIMIT 1
If a match exists, skip the insert (or optionally update the timestamp to "bump" the existing entry).
Behavior for encrypted clips (HMAC-keyed):
Within the same key epoch, HMAC comparison works:
SELECT id FROM clips WHERE hash = ? AND encrypted = 1 LIMIT 1
Same plaintext + same key = same HMAC, so this catches duplicates without decryption.
Key rotation scenario:
When encrypt_history() or decrypt_history() runs, all hashes are recomputed with the new key (or reverted to SHA-256). After rotation, existing HMACs are stale but the new hashes are consistent going forward. No special handling needed — dedup naturally works within each epoch, and cross-epoch duplicates are an acceptable edge case (the alternative is decrypting everything to compare, which defeats the purpose).
Deduplication strategy options
Option A: Most-recent only (current behavior)
- Skip insert if hash matches the single most recent entry
- Index
idx_clips_hash is unused
- Simple, but allows non-consecutive duplicates to accumulate
Option B: Full-history exact match
existing = conn.execute(
"SELECT id FROM clips WHERE hash = ? LIMIT 1",
(content_hash,)
).fetchone()
if existing:
return None # or bump timestamp
- Uses
idx_clips_hash for fast lookup
- Catches all duplicates within a key epoch
- Decision needed: skip silently, or "bump" the existing entry's timestamp?
Option C: Full-history with bump
existing = conn.execute(
"SELECT id FROM clips WHERE hash = ? LIMIT 1",
(content_hash,)
).fetchone()
if existing:
conn.execute(
"UPDATE clips SET timestamp = ? WHERE id = ?",
(timestamp, existing["id"])
)
conn.commit()
return None
- Same as B but updates the timestamp so the entry floats to the top of
--list
- Better UX: re-copying something makes it "recent" again without creating a duplicate
- Slightly more complex
Option D: Configurable dedup scope
[history]
dedup = "last" # "last" (current), "all", or "none"
- Lets users choose their preference
- "none" is useful for clipboard audit trails where every copy matters
Design considerations
- HMAC consistency within key epoch: As long as the encryption key doesn't change between saves, HMAC-SHA-256 is deterministic — same plaintext produces the same hash. Full-history dedup works without decryption.
- Cross-epoch duplicates: After key rotation, old HMACs are incomparable to new ones. Accepting this is better than requiring full decryption for dedup.
- Bump vs skip: "Bump" (Option C) is more useful — if you re-copy a URL you copied yesterday, you probably want it at the top of
--list without a duplicate entry. But some users may want strict chronological history.
- Index justification:
idx_clips_hash currently exists but is unused. Full-history dedup would give it a purpose and make inserts with dedup O(log n) instead of O(n).
- Performance: Hash index lookup is fast. Even with 10,000 entries, this adds negligible overhead to
save().
Acceptance criteria
Related issues
Analysis
See notes/cli/2026-02-17__20-47-29__both_encrypted-metadata-architecture.md for the column-by-column encryption audit that surfaced this gap.
Full-history deduplication with encrypted content
Problem
teeclip currently deduplicates only against the most recent entry:
This means if a user copies "hello", then "world", then "hello" again, three entries are stored — the duplicate "hello" is not caught because it's not consecutive. An
idx_clips_hashindex exists on thehashcolumn but no query uses it.For unencrypted content, full-history dedup is straightforward —
WHERE hash = ?with the SHA-256 digest. But with encryption enabled, the hash column stores HMAC-SHA-256 keyed with the encryption key. This creates two complications:Key rotation: If the encryption key changes (user runs
--decryptthen--encryptwith a different key, or switches auth methods), all existing HMACs become incomparable to new ones. AWHERE hash = ?lookup would find no matches even for identical plaintext.No way to verify without decryption: Without the key, there's no way to confirm whether two entries with different HMACs contain the same plaintext. With the key, HMAC comparison works — but only within a single key epoch.
The result is that encrypted history accumulates duplicates that plaintext history would not.
Proposed solution
Implement full-history deduplication that works correctly across both encrypted and unencrypted content, with awareness of key epochs.
Behavior for unencrypted clips (simple case):
If a match exists, skip the insert (or optionally update the timestamp to "bump" the existing entry).
Behavior for encrypted clips (HMAC-keyed):
Within the same key epoch, HMAC comparison works:
Same plaintext + same key = same HMAC, so this catches duplicates without decryption.
Key rotation scenario:
When
encrypt_history()ordecrypt_history()runs, all hashes are recomputed with the new key (or reverted to SHA-256). After rotation, existing HMACs are stale but the new hashes are consistent going forward. No special handling needed — dedup naturally works within each epoch, and cross-epoch duplicates are an acceptable edge case (the alternative is decrypting everything to compare, which defeats the purpose).Deduplication strategy options
Option A: Most-recent only (current behavior)
idx_clips_hashis unusedOption B: Full-history exact match
idx_clips_hashfor fast lookupOption C: Full-history with bump
--listOption D: Configurable dedup scope
Design considerations
--listwithout a duplicate entry. But some users may want strict chronological history.idx_clips_hashcurrently exists but is unused. Full-history dedup would give it a purpose and make inserts with dedup O(log n) instead of O(n).save().Acceptance criteria
idx_clips_hashis used by the dedup query--listordering is correct after bump (if bump is chosen)Related issues
Analysis
See
notes/cli/2026-02-17__20-47-29__both_encrypted-metadata-architecture.mdfor the column-by-column encryption audit that surfaced this gap.