this post was submitted on 28 Jun 2023
145 points (99.3% liked)
Apple
17453 readers
153 users here now
Welcome
to the largest Apple community on Lemmy. This is the place where we talk about everything Apple, from iOS to the exciting upcoming Apple Vision Pro. Feel free to join the discussion!
Rules:
- No NSFW Content
- No Hate Speech or Personal Attacks
- No Ads / Spamming
Self promotion is only allowed in the pinned monthly thread
Communities of Interest:
Apple Hardware
Apple TV
Apple Watch
iPad
iPhone
Mac
Vintage Apple
Apple Software
iOS
iPadOS
macOS
tvOS
watchOS
Shortcuts
Xcode
Community banner courtesy of u/Antsomnia.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
From a technical perspective, how much would an image need to be changed before the hash no longer matched? I've heard of people including junk .txt files in repacked and zipped pirated games, movies, etc., so that they aren't automatically flagged for removal from file sharing sites.
I am not a technical expert by any means, and I don't even use Apple products, so this is just curiosity.
It depends on the type of hash. For the type of hashing used by checksums, a single byte is enough, because they're cryptographic hashes, and the intent is to identify whether files are exact matches.
However, the type of hashing used for CSAM is called a semantic hash. The intent of this type of hash is that similar content results in a similar (or identical) output. I can't walk you through exactly how the hash is done, but it is designed specifically so that minor alterations do not prevent identification.
If, for instance, I was pirating a video game, would packing it in an encrypted container along with a Gb or two of downloaded YouTube videos be sufficient to defeat semantic hashing? What about taking that encrypted volume and spanning it across multiple files?
Encrypting it should be enough to defeat either hash.
Without encryption I think it would depend on implementation. I'm not aware of the specific limitations of the tools they use, but it's for photo/video and shouldn't really meaningfully generalize to other formats.
That's a good question. First it's important to understand that hash functions for pirated games or other programs are actually different from hash functions used to detect media like pictures, movies, and sound recordings.
If you alter a piece of code or text from the original version the hashes will no longer match, but typically those hashes should match and some kind of alarm gets tripped if they don't.
With media files like music, movies, or pictures, it works the other way around. Detection tools are looking for something that is not necessarily an exact match, but a very close match, and when such a match is found, alarms get tripped (because it's CSAM, or a copyright violation, or something like that).
As to the techniques you mentioned for concealing a pirated game in a ZIP file with a bunch of junk TXT files, that's not going to work. The reason it doesn't work is that if you ZIP something, all that uses compression algorithms that change the contents of the ZIP file in predictable repeating patterns. It's easy to detect and compensate for. Now, if you use your ZIP/compression tool to actually encrypt the file with a good algorithm and a strong password, that's different, but then you don't need to pack it with junk. (And distributing the password securely will be a problem.)
Please, people who know more about hashing and media detection with hashing, let me know if I got something wrong, I probably did.