Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obfuscating content IDs in tombstone requests to limit bad actors trying to collect "sensitive content" lists #282

Open
pirate opened this issue Aug 10, 2024 · 5 comments

Comments

@pirate
Copy link

pirate commented Aug 10, 2024

So that you can't just use those IDs to find the list of banned content and be evil with it.

@wilwade
Copy link
Member

wilwade commented Aug 12, 2024

Hmm... I'm not sure that alone would work to limit discovery of the delete requested content, but it would slow it down/increase the cost of it. Perhaps that is enough?

Here's how to recover the full id:

  1. Index all the message ids from the user (these remain available even after tombstoning)
  2. Take the shortened id, and search for (the one) matching id.
  3. Id recovered

There was a discussion early on about using the hash of the id instead, but that ends up with the same issue. That said, this does make it harder to generally locate deleted content for an arbitrary user, so perhaps a worthwhile action?

@pirate
Copy link
Author

pirate commented Aug 12, 2024

Some salted hash of the ID could work too, the threat model I was thinking of is this:

  1. User A publishes tiananmen_square_protest_plans.txt on the network
  2. User B joins the network later (malicious government spy looking for dissidents and sensitive files, they have User A on a watchlist)
  3. User A (while being closely watched by User B) tries to delete the file from the network
  4. User B sees a deletion request for that file, and now adds it to a list of potentially sensitive files that they can use to search for more dissidents who might be hosting that file

Or even darker:

  1. Bad User A publishes terrible_csam.mp4 to the network
  2. Good User B, a moderator, catches it, tombstones the content, and blocklists User A + reports them to police
  3. Bad User C has been lurking, and is on a mission to collect CSAM. they monitor the tombstone list for banned content and and now have a new hash of a file they can look for on other distributed filesystems / bittorrent / hosts that haven't deleted it / etc.

Another option is to have some kind of handshake where the user announcing the tombsone only releases the first half of the hash, User B who might have the file responds to User A with a hash of the second half. User A then checks that and if the hash matches the one they're trying to tombstone they send and Ack back and then User B deletes the file. (or something similar) It might have the downside of generating a flood of handshakes that DoS the original user though?

@wilwade
Copy link
Member

wilwade commented Aug 12, 2024

I'm not sure the first threat model we can protect against if we assume a state level actor will always try to just have all the content, just as I assume they do now via scraping and agreements.

The second, it might help with as it assumes a lower level of ability to collect and process information. I think shortening does assist with decreasing this sort of optimistic discovery.

Handshakes assume interactivity on behalf of users or providers that currently DSNP doesn't have, so while possible, would require additional support structures.

Salting the hash with the id is an interesting idea. It drastically increases the search scope, so given enough volume, it would limit the misuse to scaled organizations at least. (As someone could still generate a database to build the reverse index.) It is also not introducing any new data into the mix.

It does make it harder to validate tombstone announcements (as the assumption is that the tombstone announcement doesn't disclose the id of the sender, only the provider).

Please expand on the idea however if you have additional thoughts. Perhaps there is a path there.

@pirate
Copy link
Author

pirate commented Aug 12, 2024

Yeah you basically get the idea of my concerns, and I agreee it is a hard problem to solve when state-level actors are assumed to have broad network visibility. There are edge cases that can be made more secure (e.g. when a state-level actor joins an existing network for the first time, limiting their historical access to sensitive deletions), but that's up to your team to weigh the various tradeoffs and decide on a policy.

Just keep in mind those two scenarios as you develop in the future, as they happen all the time on distributed storage networks. Because distributed storage uniquely attracts political dissidents who want anonymity and privacy, it also attracts all the governments/CSAM collectors/hackers trying to chase them and mine the networks deletion activity for sensitive material.

I just wanted to raise this discussion on Github because I spoke with some of your team members at Dweb and brought up these concerns, and they suggested I open an issue :)

@pirate pirate changed the title Chop off some character from content IDs when propagating tombstone requests Obfuscating content IDs in tombstone requests to limit bad actors trying to collect "sensitive content" lists Aug 12, 2024
@wesbiggs
Copy link
Member

Happy to discuss this item on the next DSNP spec community call on September 6. Time and link here: https://vimeo.com/showcase/dsnp-public-spec-meeting

I agree this is not a problem unique to DSNP... it would be interesting to understand how other projects think about these issues, and if there could be commonality around a solution architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants