-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working Collision? #1
Comments
Yes! I can confirm that both images generate the exact same hashes on my iPhone. And they are identical to what you generated here. |
@dxoigmn Can you generate an image for any given hash (preimage attack) or do you need access to the source image first (second preimage attack)? |
hmmm |
@fuomag9 Interesting, but one ethical question remains -- how did they obtain the CSAM hashes? I was under the impression that the NeuralHash outputs of the NCMEC images were not readily available. This strongly suggests that the authors must have obtained child pornography, hashed it, and then generated spoofs. |
Or they used images generated by that site as the starting point. As I noted in my previous comment, it's impossible to know without having the NeuralHash NCMEC database. |
This is not only a collision, it is a pre-image, which breaks the algorithm even more. Collison: Pre-image: |
If I'm not mistaken the DB with the hashes is stored locally and you can extract it from the iOS 15 beta Edit: the hashes are stored locally but not in a way that makes them recoverable to the end user, see below |
Holy Shit. |
@erlenmayr No, this is likely a second preimage attack. In a preimage attack you're just given the hash, not the image. |
|
This is from apple's pdf on the technical details of their implementation. Feel free to correct me but from my understanding the blinded hash is the CSAM hash DB Edit: I was wrong and we cannot extract them |
https://twitter.com/angelwolf71885/status/1427922279778881538?s=20 this seems iteresting tho, a CSAM sample hash seems to exist? |
Not sure. My sense is that
No one has obtained the CSAM hashes, AFAIK. This repo is just a neural network that takes an input and produces a hash. It's as if someone released a new hash algorithm. (What you do with those hashes is a different story and the comparison of the hash with CSAM hashes is a whole other system.) This just shows that an image of a dog has the same hash value as a noisy gray image. That is, ask the model for the hash of the dog image, then ask the model how to change a gray image to make it output the same hash as the dog image. So it's a second-preimage, which we know has to be possible (by the pigeonhole principle); but it's really just a matter of feasibility. The interesting thing about neural network models (when compared to cryptographic hashes) is that they give you a gradient which tells you how you can change stuff to optimize some objective. This is the basis of deep dream, adversarial examples, and even just plain old training of neural network models. |
From what I understood, the story the HN comment is referring to is AI generated porn, which can match as "real" porn. Aka the "send dunes" story: https://petapixel.com/2017/12/20/uk-police-porn-spotting-ai-gets-confused-desert-photos/ This issue here, however, is a hash collission. This is huge, if confirmed. Really ugly stuff. |
If they compare on the server then Apple is able to know the outcome of the comparison even if there is only a single match. The idea is that Apple can only know that a certain user has matching images in their iCloud account as soon as a certain threshold number of matches is reached. That's my understanding as a non-expert in this field. |
I think they also inject false positives for results on device side, so they don't know if there is only a single match. They know only when threshold is reached. It was in the PSI paper. |
Every practical hashing algorithm is one-to-many. It is only a matter of time when collisions come. This repo contains the model from older release and we don't really know how they have tweaked parameters on Apple side or even how the huge data set is improving the accuracy. NN has many layers so training is rather important. |
Let me quote from this article of someone who can explain this better than me: https://www.hackerfactor.com/blog/index.php?/archives/929-One-Bad-Apple.html
and:
Let that sink in. Now back to our issue: While hash detection is a really bad idea in general for such purposes, the specific implementation from apple does not matter, because it is closed source. You literally don't know what apple is doing, with what data, and what result comes out of it. Even if apples specific algorithm is the best in the world and does not have these drawbacks: You would not know. You would have to live in constant fear of your phone "thinking" you may are doing something wrong. That's scary. |
That is very true. We can either give total trust or nothing at all on closed systems. |
It's not. It's encrypted using elliptic-curve cryptography first. They keep the actual CSAM neuralhash db on their server. Why do they do that? A) To prevent reversing the hashes, as mentioned. B) So they can derive the encryption keys for the image locally, which is what the blinded hash table is also used for. I'm leaving out a lot of details that are mentioned in the whitepaper they released, so read that. |
@weskerfoot Yeah I edited my other comment with an explanation of the blinded hash. |
all we need now is a hash of what is considered 'csam' to iphone, and start making images collide with it, I'm not sure how to get a hash without having such a file to hash it though .. Oh just curious, since the hashing is done on the client side, would it be possible to tell iCloud that the hash matched every time? and if so, what's stopping you from just flooding it with random shit? |
I wonder if this code can be ported so you can generate the collision an Android device... |
can we take any given image and then make its hash totally different, despite the image looking basically identical? |
cropping the image seems to work since the algorithm (at least what we have now) is vulnerable to that |
ah yes, because pedos are being so sophisticated to use end to end encryption these days that we need to backdoor everyone's phones, because think of the children but there NOT sophisticated enough to ..checks notes.. crop images, |
It's easier to find two images that match than it is to find an image that hashes to a known hash, even in this case. But this is always the case: even for a black box arbitrary hash function finding a collision needs sqrt() the amount of work to find a preimage, even with a totally generic attack, due to the birthday paradox. Both attacks (preimage and collision) are easy with this hash function because it has a fundamentally weak design. For stronger hash functions like, say md5, one can now compute collisions easily but a second preimage is still unachievable. The poster you were responding to was emphasizing the point that the attack was a preimage attack rather than a collision because we all expect collision attacks to be much easier, but they're less useful for the attacker because the attacker has to influence both inputs, rather than just one. |
I just don't see it because one hash always has to be held constant when comparing it to another. You can vary both, or you can vary one twice as many times. It depends on the type of attack, and when it is brute force - just varying inputs and comparing outputs, you can vary one or both for the same cost. |
It's a surprising fact. The reason that in a generic attack a collision is fundamentally easier is because the number of things you match against increases. Imagine that you want to find some x such that H(x) matches either H(y) or H(z) . Clearly that's easier than finding something that matches H(y) alone because you get two chances instead of one: it takes half as many comparisons on average. So when you want to find x,y such that H(x)==H(y) you can alternate between varying x and varying y, and each time you compare against all the previously computed values, not just the most recent. Each operation the chance of success increases because you're comparing with more and more alternatives. This takes a sqrt() of the amount of operations as only changing one input. This is due to the same mathematical fact that makes it so a room with a surprisingly few number of people is likely to have at least two people sharing a birthday: When there are 23 people the odds of two sharing the same birthday is >50% even though there are 365 days in a year. There are fancy algorithms that eliminate the storage and search costs for comparing all the prior hashes, by using a little more computation. A good place to read about memory efficient collision algorithms is the wikipedia article on cycle detection-- because if you choose the next value to try based on the hash of the prior value, finding a collision is equivalent to finding a cycle in the graph created by feeding the output of the hash back into itself. That applies to generic attacks that work even without any knowledge of the internal structure. With the apple hash function the hash is differentiable, so we're not limited to generic attacks. In this case, it's easier to find a collision because we're actually not limited to updating one at a time. Instead, we can construct a function that computes the difference between two images hashes and use autodiff to find the partial derivatives of the difference output with respect to all of the pixels at once and modify both images at the same step. The extra dimensions gives the attack a lot more degrees of freedom. |
Here are some more good looking examples: (prior examples here) Altering Lena to match barbara: $ .nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 130310372-d6b2c633-c5d3-4b3e-bc12-d3b2f3104ec6.png
a426dae78cc63799d01adc32
$ .nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 130310383-9fcc80ba-117e-4383-9177-0df24bc99f9a.png
a426dae78cc63799d01adc32 |
Hey @gmaxwell What happens if you resize the image down to say 1/4 then upsample back to the original size, then run the hash again? Do they still collide? |
They're still fairly close, e.g. using the lena/barbara above: $ convert -scale 90x90 130310372-d6b2c633-c5d3-4b3e-bc12-d3b2f3104ec6.png x.png ; convert -scale 360x360 x.png 1.png
$ convert -scale 90x90 130310383-9fcc80ba-117e-4383-9177-0df24bc99f9a.png x.png ; convert -scale 360x360 x.png 2.png
$ ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 1.png
a426dae78cc23399501acc32
$ ./nnhash.py ./model.onnx ./neuralhash_128x96_seed1.dat 2.png
a42692e78ce22b99d11ace32 I suspect the property could be met by making the search work that way. E.g. stick an upsampler on the computation graph for the model so that it works with a 90x90 image, and do the search there. Then upsample it and use it as a starting point for the full res search (or jointly test the full res and down/up path). Is there some particular reason that this property is important? I could imagine that doing the search in a multiscale way might result in better looking images... but is there something about the apple setup that works this way? |
I won't be posting any more preimages for the moment. I've come to learn that Apple has begun responding to this issue by telling journalists that they will deploy a different version of the hash function. Given Apple's consistent dishonest conduct on the subject I'm concerned that they'll simply add the examples here to their training set to make sure they fix those, without resolving the fundamental weaknesses of the approach, or that they'll use improvements in the hashing function to obscure the gross recklessness of their whole proposal. I don't want to be complicit in improving a system with such a potential for human rights abuses. I'd like to encourage people to read some of my posts on the Apple proposal to scan user's data which were made prior to the hash function being available. I'm doubtful they'll meaningfully fix the hash function-- this entire approach is flawed-- but even if they do, it hardly improves the ethics of the system at all. In my view the gross vulnerability of the hash function is mostly relevant because it speaks to a pattern of incompetence and a failure to adequately consider attacks and their consequences.
And this post written after:
|
FWIW, the best defense we have against these kind of attacks is adversarial training. Adversarial training is exactly what you describe: construct evasions/collisions, train on them, and repeat. My initial interest in all this was to see whether Apple employed such a training method. It's clear they did not, but I see no reason they couldn't do this for their next version of this model. |
Assuming Apple doesn't renege before the feature is released, I suspect the best way to compel them to do so is to blow it so widely open as soon after launch as possible that they have no choice but to retract the system. Still, I haven't seen an answer to what I think may be a very significant question: |
I'm not sure I agree. One can build a system with fairly strong guarantees against false positives at the expense of false negatives: normalize then downsample and quantize the pixels to the lowest resolution that still leaves images reliably recognizable. Use sha256 on the result. This will give a guaranteed protection against adversarial false positives (assuming the security of sha256). It also won't catch as many fuzzed/recompressed images, though it will catch some. No perceptual hash will strongly protect against evasion by parties willing to modify their images, in any case. The correct false positive vs false negative tradeoff that is best depends on how much you value users' privacy vs how many images you're willing to miss. I don't think apple should be deploying this system at all, so obviously I also don't think exposing users to false positives to avoid a few false negatives is a good tradeoff. :) I view the use of a complex perceptual hash as a product of decisions being made by people engaging in a one-sided risk assessment that underweight privacy harms and overweight missed images. The simple cryptographic hash based approach I mention above also has the advantage of being extremely simple to build (or disadvantage if you consider justifying headcount an advantage). If someone really must justify headcount, the simple approach could be improved further e.g. a neural network post processing "error correction" filter that helps quantize pixels consistently, without removing a strong cryptographic protection against adversarial false positive. (e.g. a guarantee that any positive match will have all pixels +/-1 from the true image in a normalized downsample, unless sha256 is broken).
I expect that extensive adversarial training would make it harder, but still leave construction of preimages possible. In the domain of cryptography it's long been recognized that systems which appear to be secure because their creators attacked them and failed are often not secure at all. Competently constructed cryptosystems are built in ways where security is provable outright or tracable to a small number of well understood assumptions. Above I describe a scheme for downsampling with "error-corrected" rounding that could give security against adversarial false positives traceable to the security of sha256 against second preimages. It would be less fuzzy, but if you are not valuing avoiding false-negatives over all other considerations, less fuzzy should be an option. :) Even better could be done if they didn't insist on hiding the database.
The idea is that this second 'secret' hash will be only applied to images that match the first. So arguably there is still some value in having the first to keep user's data hidden from apple absent false positives on the first function. I find their counter unconvincing because: Making it secret protects it much more against review than it does against attackers-- attackers may steal the function (via insider threats) or just be handed it as is the case for the state actors who produce these databases (and many of the concerns here are that state actors may be the attackers-- even if you trust your state no one is safe from a potentially unfriendly regime change in the long run). If the second function has similar structure, it will likely have similar vulnerabilities in any case. |
Lotte no Omocha! |
I was wondering whether true positives would still match, yet false positives would not match. As in, if there's some micro property of the false image that causes the hash to match that can be destroyed by down/upsampling. The theory being that true positives have an image structure that matches much more closely at varying resolutions. I don't know much about how the adversarial images are arrived at, but I just wondered if this was a property that wouldn't hold for false matches but would for true ones. |
Because of the way mine are constructed I wouldn't be shocked if they survived up and down sampling better that naturally constructed images: I penalize heavily towards making the changes in lower frequency components rather than higher-- probably much more than I should for the best image quality but it was pretty much the first thing I thought of. :) I'm pretty unfamiliar with the machine learning tools I'm sure someone who knows what they're doing can do a lot better than me if they give it a good try. :) |
@tomhicks There is a general technique you can use if you intend for an adversarial perturbation to be robust to a particular set of transformations (e.g. down/upsampling). For many types of transformations, the adversarial examples by default won't be robust to that type of transformation (e.g. rotation), but if you know what transformation you want the example to be robust to a priori, you can make it so. At a high level, this can be done by incorporating the transformation into the training process. Suppose you have some set of transformations T you want the example to be robust to, then the optimization problem becomes that you want to find an example |
@anishathalye interesting. That raises two more questions, if you don't mind!
|
It is the same concept at a high level: find an input to a neural network that produces a desired output (and perhaps satisfies some extra properties, such as "looks similar to a particular image" or "doesn't look like glitch art"). There has been a lot of research in the area of adversarial examples (thousands of research papers), and researchers have demonstrated adversarial examples on all sorts of neural networks, like image classification, object detection, semantic segmentation, speech-to-text, and many others (including weird things like style transfer that may not make sense to "attack" from a security perspective). Informally, I did find it harder to attack the neural hash algorithm compared to standard image classifiers like those trained on ImageNet. It took some amount of parameter tuning and such to find collisions for NeuralHash fairly reliably, and even then, I don't think the results look fantastic. Whereas for standard image classifiers, you can basically do whatever attack (like a single step of gradient descent) and they fall over right away; and it doesn't even take much work to get adversarial examples with imperceptibly small perturbations. Here's one informal / intuitive explanation for why this may be the case: for NeuralHash collisions, we're trying to "encode" 96 bits of information into an image, whereas for attacking an ImageNet classifier, we're only trying to encode log2(1000) ≈ 10 bits of information. (Though of course, this is not a formal proof, nor is it even a fully satisfying intuitive explanation. The semantic segmentation or style transfer adversarial examples, for example, encode more than 96 bits of information into the image.) Apple might have also done adversarial training (a standard technique to make networks more robust to adversarial examples); I don't remember off the top of my head if there is research on identifying whether a given trained model has been trained using such techniques (and afaik Apple hasn't published details on how NeuralHash was trained).
In my experience, yes, this is generally true. Unless the adversarial examples have been made robust to the particular kind of transformation being considered. With the example above, of the adversarial example robust to rotation, I wouldn't be surprised if the adversarial example is more robust to rotation than the original image. |
Thanks for that. As far as I know, Apple's only details on how NeuralHash was trained are broadly: shown similar images and told to match, shown dissimilar images and told to not match. You would hope there's some adversarial training in there. Although wouldn't that just push the problem to a different type of adversarial images?
Interesting. That would be interesting to see. |
dont forget the UK Website Blocklist which was created to "stop child porn" and now is exclusively used to block sites like ThePirateBay 1337x and such Why isnt DNSSec standard already? |
The point of collision resistance is to make it computationally harder, not impossible. I wish we had a proof of one-way functions, but we don't. The collision resistance of SHA256 (and friends) rely upon a similar methodology as this model: just a small set of researchers trying to understand the underlying algorithm and find more efficient methods of finding collisions. |
By possible I meant practically possible, though thanks for clarifying for those for whom it wasn't clear. For cryptographic hashes we have decades of experience, powerful tools, mountains of research. Moreover, for a neutral network to be trainable it must be significantly piecewise differentiable and locally linearly approximatable-- and avoiding that kind of behavior is an explicit requirement in the design of transformations in symmetric crypto. Much of the analysis and review in symmetric cryptography goes into proving how free the function is from those behaviors. :) So I don't think it's at all reasonable to draw a parallel here: Using a neural network as a hash is more like the state of symmetric crypto 40 years ago, but with an added problem that it must have properties that we know make security hard. |
Apple claims that they have a second hashing algorithm running server-side to prevent false positives. That second algorithm could very well be PhotoDNA. Who's up for the challenge to find an image pair where both NeuralHash and PhotoDNA collide? You can grab PhotoDNA here: https://github.com/jankais3r/jPhotoDNA |
No-one is using these hashes to secure communications between two parties, though which I think is an important distinction. This doesn't make the system "insecure" in any way. By generating a collision, all you enable is for Apple to look at the photo that you supplied. This makes the system marginally less useful to Apple for avoiding server work. |
Except that instead of Apple employees looking at the flagged photos, totalitarian governments could easily force themselves into this manual review process. Then all they need to do is to disseminate pro-democracy images that collide with actual CSAM images in Apple's database and the system will flag dissidents for them. |
Why not just target the server? You're gonna need to do that anyway so you might as well just do that. |
@tomhicks The system that apple proposes leaks the cryptographic keys to decrypt photos conditional on the user being in possession of a threshold of neuralhash matching images. In the attack @jankais3r describes an authoritarian regime compromises apple through hacks, leaks, moles, bribes, or a National Security letter. They then either distribute to the public pro-freedom images modified to match the neuralhashes of known child porn OR they take child porn and modify it to match circulated pro-freedom messages and submit the child porn to the agencies that build the lookup databases. Your computing device processes your private files and leaks to apple the decryption keys, demonstrating your possession of the targeted political speech and allowing the attacker to begin more detailed tracking --- or just send out a group to execute you. An analogous attack could also be performed without attacking the neuralhash: just put the ideologically identifying images in the database directly. Because the comparison is performed using a private set intersection, no one can directly tell what the database contains. However, Apple has made unfalsifiable claims of various self-enforced adhoc protection mechanisms which should make this attack harder: E.g. they claim their database is an intersection of databases from multiple governments, so absent a neuralhash attack the governments would need to collude to include non-child-abuse images, similar to how 'five eyes' countries spy on each others citizens to evade the hard legal prohibitions against spying on their own. Targeted false neuralhash matches undermine those protections. Moreover, the potential for adversarial collisions means that if the plain-text of the database is leaked or otherwise made public, or just shared with third party reviewers, any inappropriately included material is potentially deniable due to the potential of making false matches. This is particularly unfortunate, because there are simple and obvious schemes which, assuming the database has no false entries in it, make false positives computationally intractable with extremely high confidence and would eliminate the potential for the above attacks in a way which is independently verifiable. They have the disadvantage of having more false negatives. Since detection here requires the target running software that betrays their self-interest and especially since it's extremely easy to change images so their neuralhashes don't match any such system is already only going to be effective against people who are not trying to evade it. Yet at no point has Apple attempted to justify the use of a function which has a meaningful amount of false positives over ones that a FP free for practical purposes. But why would they without people showing that false positives are a potential issue through attacks on the hash function like the ones people have presented in this issue? |
This is spam, @amir32 should be banned |
Can you verify that these two images collide?
Here's what I see from following your directions:
The text was updated successfully, but these errors were encountered: