Security: 16-bit pasta IDs allow enumeration and collisions

## Summary

Pasta IDs are generated as 16-bit random numbers (`u16`), giving only 65,536 possible values. This causes two problems:

1. **Enumeration**: `--private` (unlisted) pastas are trivially discoverable by brute-force
2. **Collisions**: no uniqueness check, so new pastas silently shadow existing ones

## Root Cause

```rust
// src/endpoints/create.rs:125
let mut new_pasta = Pasta {
    id: rand::thread_rng().gen::<u16>() as u64,
    // ...
};
```

The ID is used directly without checking for duplicates:

```rust
// src/endpoints/create.rs:377-378
let mut pastas = data.pastas.lock().unwrap();
pastas.push(new_pasta);
```

The encoding layer (animal names or `--hash-ids`) does not add entropy — both are reversible mappings of the same 16-bit number:
- Animal names: 64 words, base-64 encoding -> 1-3 word slugs cover the full u16 space
- Hash IDs: same unsalted `Harsh` instance, deterministically reversible

## Enumeration

The `--private` flag makes pastas "unlisted" (hidden from the listing page). The only access control is knowing the URL. With 65,536 possible IDs:

- An attacker iterates all values: `for id in 0..65536 { GET /upload/{encode(id)} }`
- At 100 requests/second, full enumeration takes ~11 minutes
- Every existing pasta (public, unlisted, burn-after-read) is found

## Collisions

With only 65,536 possible IDs, collision probability grows fast. The chance that at least two pastas share an ID is `1 - (65535/65536) * (65534/65536) * ... * ((65536-N+1)/65536)` for N pastas:

| Active pastas | Collision probability |
|---|---|
| 100 | ~7% |
| 256 | ~39% |
| 300 | ~50% |

A collision means a new pasta silently shadows an older one — the older pasta becomes inaccessible. No error or warning is shown.

## Impact

- `--private` (unlisted) pastas provide no real privacy
- Burn-after-read pastas can be found and consumed before the intended recipient
- Data loss from collisions on any deployment with more than a few hundred pastas
- Combined with the forgeable owner_token issue, an attacker can enumerate AND repeatedly read burn-after-read pastas

## Suggested Fix

Change `gen::<u16>() as u64` to `gen_range(0..=9_007_199_254_740_991)` (2^53 - 1), and add a uniqueness check before insertion.

### Why 2^53 and not larger?

The `id` field is `u64` throughout the codebase. Using the full u64 range hits compatibility limits:

- **SQLite**: `id INTEGER PRIMARY KEY` uses signed i64. Values above i64::MAX (2^63 - 1) would overflow.
- **JSON / JavaScript**: `Number.MAX_SAFE_INTEGER` is 2^53 - 1 (9,007,199,254,740,991). IDs above this lose precision when parsed by JS clients (the web frontend, any API consumers).
- **Animal names**: with u64, URLs grow to 10-11 animal words. With 2^53, they stay around 8-9 — longer than today but still reasonable.

### Is 2^53 secure?

At 100 requests/second, enumerating 2^53 values takes ~2.8 million years. Collision probability among 1 million active pastas is ~0.01%. More than sufficient.

### Why not UUID?

Changing `id` from `u64` to UUID would require:
- Rewriting the SQLite schema (`INTEGER PRIMARY KEY` -> `TEXT`)
- Rewriting all animal-name and hashid encoding/decoding (both take `u64`)
- Changing all endpoint handlers (URL -> u64 lookup)
- Changing filesystem paths (`attachments/{id_as_animals()}/`)
- Changing readonly key material (`encrypt(id.to_string(), ...)`)
- Breaking all existing deployments (database migration, orphaned attachment directories)

Capping to 2^53 solves the security problem with a one-line change, fully backwards compatible.

## Backwards Compatibility

Fully backwards compatible. No changes to:

- **Pasta struct or database schema** — the `id` field remains `u64`. SQLite `INTEGER PRIMARY KEY` and JSON serialization work unchanged.
- **Existing pastas** — old pastas keep their small IDs. They coexist with new larger IDs. No migration needed.
- **Existing links** — all URLs to existing pastas continue to work. The animal-name and hashid decoding functions accept any u64 value, regardless of magnitude.
- **Encoding functions** — `to_animal_names(u64)` and `to_hashids(u64)` are unchanged. Larger IDs produce longer slugs (8-9 animal words instead of 1-3), but the encoding is the same algorithm.
- **Uniqueness check** — the new `while pastas.iter().any(|p| p.id == new_pasta.id)` loop runs inside the existing mutex lock, so it correctly checks against both old and new pastas.

The only observable difference: new pastas get longer URLs. Existing pastas and links are unaffected.

## Potentially Related Issues

- #257 — "Allow shorter pasta ID until a longer one is needed" — discusses ID length as a UX concern; a commenter notes "reducing to only one animal name is a major security flaw"
- #54 — "[Feature request] option to use random characters instead of animal names"

Neither frames the 16-bit space as a security vulnerability.

---

*This issue was drafted with AI assistance (Claude). If any facts are incorrect or misrepresented, please point them out and I'll correct them.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: 16-bit pasta IDs allow enumeration and collisions #322

Summary

Root Cause

Enumeration

Collisions

Impact

Suggested Fix

Why 2^53 and not larger?

Is 2^53 secure?

Why not UUID?

Backwards Compatibility

Potentially Related Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Security: 16-bit pasta IDs allow enumeration and collisions #322

Description

Summary

Root Cause

Enumeration

Collisions

Impact

Suggested Fix

Why 2^53 and not larger?

Is 2^53 secure?

Why not UUID?

Backwards Compatibility

Potentially Related Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions