Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Some metadata and usage information stored in iCloud remains under standard data protection, even when Advanced Data Protection is enabled. For example, dates and times when a file or object was modified are used to sort your information, and checksums of file and photo data are used to help Apple de-duplicate and optimize your iCloud and device storage..."

Photo checksums can't be e2e encrypted huh? They reported today they abandoned their plans to do CSAM scanning on people's devices[1] and connecting the dots it seems like they wont need to since they can just do it in the cloud.

[1] https://www.wired.com/story/apple-photo-scanning-csam-commun...



The abandoned plan was perceptual hashing, which should return the same hash for very similar photos, while the new one is a checksum, which should return the same hash only for identical photos. I don’t think that invalidates the point, but it does seem relevant. It certainly makes it much less useful for CSAM scanning or enforcing local dictator whims, since it’s now trivial to defeat if you actually try to.


The big difference is with photos end-to-end encrypted, Apple can't (by choice nor force) have human "content reviewers" look at photos to inspect them for unlawful content, as was the intention under Apple's 2021 plan [1] after a threshold of 30 hash matches was met.

Although it was starting on CSAM material, it wasn't clear which other illegal activities Apple would assist governments in tracking. In countries in which [being gay is illegal](https://www.humandignitytrust.org/lgbt-the-law/map-of-crimin...), having Apple employees aid law enforcement by pointing out photographic evidence of unlawful behaviour (for example, a man hugging his husband) would have been a recipe for grotesque human rights abuses.

With photos encrypted, Apple can't be pressured to hire human reviewers to inspect them, and thus cannot be pressured by governments that enforce absurd laws to pass on information on who might be engaging in "unlawful" activities.

[1] https://www.eff.org/deeplinks/2021/08/apples-plan-think-diff...


>The abandoned plan was perceptual hashing, which should return the same hash for very similar photos . . .

Is there any proof they actually abandoned this? NeuralHash seems alive and well in iOS 16[1]. Supposedly the rest of the machinery around comparing these hashes to a blind database, encrypting those matches, and sending them to Apple et al. to be reviewed has all been axed. However that's not exactly trivial to verify since Photos is closed source.

[1]: https://support.apple.com/guide/iphone/find-and-delete-dupli...


Anything over a network can be decrypted and inspected with a MITM proxy (manually adding its root certificate to the trust store), as long as only TLS (no application-level encryption) is being used.


As far as I remember, iOS native apps and services now either consistently use CA pinning or largely don't respect user-added CAs.


There are a multitude of ways to inspect the decrypted traffic of your own device, whether it's a jailbroken iPhone provided by Apple to the security community or a non-kosher jailbroken device. People inspect this traffic all the time.


No. Install Charles Proxy (iOS app) and see what you can get of the MITM proxy it ships with. Many apps don’t ship with pinning.


But most importantly the whole OS and all of the integrated apps do use pinning.


> . . . as long as only TLS (no application-level encryption) is being used.

Therein lies the rub: the payload itself is protected by an encryption scheme where the keys are intentionally being withheld by either party. In the case of Apple's proposed CSAM detection Apple would be withholding the secret in the form of the unblinded database's derivation key. In the case of Advanced Data Protection the user's key lives in the SEP, unknown to Apple.

By design the interior of the "safety vouchers" cannot be inspected, supposedly not even by Apple, unless you are in possession of (a) dozens of matching vouchers and (b) the unblinded database. So on the wire you're just going to see opaque encrypted containers representing a photo destined for iCloud.


Apple does not need scan your photos on a server, because they now can do it on a device.


The original implementation also involved sending a "safety voucher" with each photo uploaded to iCloud, which contained a thumbnail of the photo as well as some other metadata.

The vouchers were encrypted, and could only be decrypted if there were, I believe, 30 independent matches against their CSAM hash table in the cloud. At that point the vouchers could be decrypted and reviewed by a human as a check against false-positives.

It sounds like with a raw byte hash they might be able to match a photo against a list of CSAM hashes, but they wouldn't be able to do the human review of the photo's contents because of E2E.


That would be interesting. Then all someone has to do is generate images that collide with the ones in the CSAM hash database and airdrop them to someone, then they’re suddenly the target of a federal investigation. I remember someone posting about a year ago a bunch of strange looking images that produced those collisions. If it’s all E2E then all Apple sees is a matching hash and can’t do any further review other than refer to law enforcement.


> Then all someone has to do is generate images that collide

If the hashes are cryptographic, then this is impossible (given today's technology).

> with the ones in the CSAM hash database

The CSAM hash database isn't public AFAIK.

> I remember someone posting about a year ago a bunch of strange looking images that produced those collisions.

You're probably thinking about their proposed 'perceptive hash', which has since been scrapped.


Someone mentioned here but I didn't confirm that Apple is stopping the CSAM scanning. It makes sense because there's nothing they could reasonably do even if they found matching hashes. It seems unlikely they'd report these findings to the police if there's no manual ability to review the contents first.



Under the original plan, someone would indeed manually review the contents if the threshold for number of CSAM images were released.


I'm assuming these are normal checksums (bitwise hashes), whereas before they were doing a hand-wavy AI-based thing that they called "checksums" but weren't really. The latter captured rough visual qualities of the images in question, which is why it had a false-positives problem. A real checksum shouldn't have that problem; in theory you'd only be able to detect an exact match of a file you already have and are looking for. So it is meaningfully different.

Edit: confirmed that these are regular, real checksums https://support.apple.com/en-us/HT202303

> The raw byte checksums of the file content and the file name


> The raw byte checksums of the file content and the file name

I wonder if this is literal; otherwise they wouldn't achieve any de-dupe if you just rename the file.


I assumed separate checksums are made from the file name and the contents. Though even if not, it would seem useful for eg. syncing between devices ("does file X already exist so we don't need to download it?")


Uhm... that's a significant leak. Most files you have are not unique, including personal photos (if you shard them). So all Apple needs to do to uncover a significant part of what you have on iCloud is get all the hashes of your files and find the same hashes in others accounts that don't have e2e enabled and other sources to recover the content. And even without content, it is a great way to find connections between people (but they already have non-e2e encrypted contact data to do that...).

Personally, I don't think Apple intends to screw you, and they have a good reason, but isn't not trusting your provider the entire point of e2e encryption?

It is one of the first question I asked myself: "with e2e encryption, it means no de-duplication, it will be expensive for Apple". Turns out they still have de-duplication, and therefore weaker privacy.

Anyways, "As we continue to strengthen security protections for all users, Apple is committed to ensuring more data, including this kind of metadata, is end-to-end encrypted when Advanced Data Protection is enabled". It would be interesting to see if they really are committed. For now, I don't blame them, it is already better than most offerings, and it just came out. However, it will be an interesting point to watch for in the future: it is a privacy feature that actually costs Apple money to run, will they do it?

Note: I assume a standard hash like SHA, working at byte level. Not the CSAM scanning thing that can match similar pictures even if the files are not exactly the same.


Can you elaborate on this comment in terms of how no de-duplication is in any way expensive to Apple? People have to pay for their cloud storage generally (past 5GB) and Apple presumably has their price structure setup in a way where it is either profitable or at least only negligibly costs them as a loss leader for its expensive products.

If someone has all kinds of duplicates, so what? Eventually, they have to pay and up their subscription price for the additional cloud storage. The only way de-duplicating could possibly save money is if two or more people with the same file are both pointed to that same file in a location that is not within their account.

I don't buy this de-duplication argument.


"checksums of file and photo data are used to help Apple de-duplicate and optimize your iCloud and device storage"

This is likely describing content-addressable storage. It is the underpinning of many iCloud services that store user files / blobs. It is also a commonly used pattern in backend services generally.

https://en.wikipedia.org/wiki/Content-addressable_storage


The problem is that a stream cipher is going to have some per-object uniqueness (a salt, IV, etc.), so by design even if you feed it related input blocks you will get different output blocks. This is, of course, antithetical to deduplication: so you need to check/store the hash of the input before it goes through the cipher.

The presentation about ZFS' native encryption[1] covers many of these sorts of trade-offs necessary to do full-disk encryption at scale.

[1]: https://www.youtube.com/watch?v=frnLiXclAMo


I always thought the client-side hashing plan was something of a giveaway to authoritarian governments which would have demanded Apple check their own list of verboten files against what the users had uploaded to iCloud. E.g. tank man photos.

So I read this as Apple quietly saying "we're not bending to China on privacy". Which is the first step toward probably being banned from providing Apple services in China.


People sharing images that an authoritarian government considers banned might still be exposed by such a scheme, given they are likely to be exactly the same data. There are, after all, no new photos of tank man being photographed, any that are shared would be identical to someone elses, unless every recipient opened them up and modified them, and even then I'm not sure that actually modifies the data if done on an iOS device, as modifications done to images can be undone suggesting to me they are only a layer on top of the unchanged image, which would still return the same hash.

Unfortunately, I think the privacy problems surrounding iCloud Photos remain to an extent.


Given that modifying just a single bit in an image results in a wildly different hash digest, I think the risk is a little overblown. There are probably easier ways for authoritarian governments to figure out who's sending illegal content, like just taking somebody's device and looking at their messages.


It's a little hard to take any percentage of 1.4B peoples phones, get them to comply unlocking their devices, and then inspecting those.

It's a lot easier to tell vendor X that "in country Y list Z is the one that should be used when looking for CSAM", and then add some known Tank Man derivative hashes to that list and find out directly who to arrest.


According to the Wired article linked by parent, there is no longer any hashing or client-side scanning scheme at all, except one that can be enabled locally by parents and doesn't report anything to Apple.


But in the documentation[1] under the heading "Encryption of certain metadata and usage information" they state:

> Some metadata and usage information stored in iCloud remains under standard data protection, even when Advanced Data Protection is enabled. For example, dates and times when a file or object was modified are used to sort your information, and checksums of file and photo data are used to help Apple de-duplicate and optimize your iCloud and device storage

This checksum is described as:

> The raw byte checksum of the photo or video

This hash can technically be shared by Apple, since they own the key used to encrypt it. And depending on when the hash is computed (post-encryption it's no problem, pre-encryption we have a problem), this could technically be used to find people sharing known undesired images e.g. Tank Man or CSAM.

[1]: https://support.apple.com/en-us/HT202303#advanced


Apple already has different terms of service for Chinese users. They simply won’t have this feature, or is it turned off silently on authority requests.

There is no way for a user to verify if Apple has actually end-to-end encrypted their backups or not.


You should Google how many times Apple has bent to China as recent as last month. Apple's human rights record is spotty at best.


> For example, dates and times when a file or object was modified are used to sort your information

Who are they sorting it for that this can't happen after decryption?


Maybe tiering and capacity planning for IOPS? Eg store recently modified files on SSDs and the rest on HDDs


I always thought that program was technically limited from the start. It seems like it would be very easy to rotate a small value of the file, even a single pixel, and return a different checksum.



Thanks, TIL!


I also have to call out how closed-source the iOS ecosystem is. They can say what they want, but who knows what it does behind the scenes.


"People rioted when we scanned for CSAM in a privacy-preserving manner but don't give a shit when we do the same thing when it's not privacy preserving so I guess just do that."


This looks like a win for the people who rioted ... what part of the new E2EE without file scanning is not privacy-preserving?


How is this a win? Either is bad, who wants them to keep a database of their image hashes? In some ways this is arguably even worse. If they keep this data online leaks and/or third party access are almost guaranteed. At the very least by authorities with a perma warrant looking for "CP" or "terrorist" material.


> At the very least by authorities with a perma warrant looking for "CP" or "terrorist" material.

I mean, unlike perceptive hashing, cryptographic hashes do not lie.


And that's exactly the problem and why I put CP in quotation marks. With everything we know about these completely unaccountable agencies, what guarantees you it will be limited to a actual crimes against children? For the children is the oldest trick in the book. Already if we talk terrorism, it's explicitly political. One woman's freedom fighter is another man's terrorist.


Maybe I'm confused. From the Wired article and other sources, it sounds like they have abandoned the idea doing any form of hash comparison or client-side scanning. Am I reading that wrong?


https://www.theregister.com/2022/12/08/apple_encryption_iclo...

If that article is correct it doesn't sound like they've abandoned the idea at all, only modified. It's still the same thing essentially, they check your file hashes for "known illegal images or other law enforcement inquiries".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: