this post was submitted on 18 Sep 2024
155 points (94.3% liked)

Technology

59673 readers
3166 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
all 47 comments
sorted by: hot top controversial new old
[–] apfelwoiSchoppen 74 points 2 months ago* (last edited 2 months ago) (1 children)

Google is planning to roll out a technology that will identify whether a photo was taken with a camera, edited by software like Photoshop, or produced by generative AI models.

So they are going to use AI to detect AI. That should not present any problems.

[–] [email protected] 23 points 2 months ago (1 children)

They're going to use AI to train AI*

So nothing new here

[–] apfelwoiSchoppen 9 points 2 months ago (1 children)

Use AI to train AI to detect AI, got it.

[–] [email protected] 8 points 2 months ago (1 children)

Yes, it's called a GAN and has been a fundamental technique in ML for years.

[–] apfelwoiSchoppen 3 points 2 months ago (1 children)

Yeah but what if they added another GAN to check the existing GAN. It would fix everything.

[–] [email protected] 3 points 2 months ago

My point is just that they're effectively describing a discriminator. Like, yeah, it entails a lot more tough problems to be tackled than that sentence makes it seem, but it's a known and very active area of ML. Sure, there may be other metadata and contextual features to discriminate upon, but eventually those heuristics will inevitably be closed up and we'll just end up with a giant distributed, quasi-federated GAN. Which, setting aside the externalities that I'm skeptical anyone in a position of power to address is equally in an informed position of understanding, is kind of neat in a vacuum.

[–] [email protected] 24 points 2 months ago (2 children)

You may be able to prove that a photo with certain metadata was taken by a camera (my understanding is that that's the method), but you can't prove that a photo without it wasn't, because older cameras won't have the necessary support, and wiping metadata is trivial anyway. So is it better to have more false negatives than false positives? Maybe. My suspicion is that it won't make much difference to most people.

[–] T156 11 points 2 months ago* (last edited 2 months ago)

A fair few sites will also wipe image/EXIF metadata for safety reasons, since photo metadata can include things like the location where the photo was taken.

[–] [email protected] 7 points 2 months ago* (last edited 2 months ago) (1 children)

Even if you assume the images you care about have this metadata, all it takes is a hacked camera (which could be as simple as carefully taking a photo of your AI-generated image) to fake authenticity.

And the vast majority of images you see online are heavily compressed so it’s not 6MB+ per image for the digitally signed raw images.

[–] [email protected] 5 points 2 months ago (1 children)

You don't even need a hacked camera to edit the metadata, you just need exiftool.

[–] [email protected] 0 points 2 months ago* (last edited 2 months ago) (1 children)

It’s not that simple. It’s not just a “this is or isn’t AI” boolean in the metadata. Hash the image, then sign the hash with digital signature key. The signature will be invalid if the image has been tampered with, and you can’t make a new signature without the signing key.

Once the image is signed, you can’t tamper with it and get away with it.

The vulnerability is, how do you ensure an image isn’t faked before it gets to the signature part? On some level, I think this is a fundamentally unsolvable problem. But there may be ways to make it practically impossible to fake, at least for the average user without highly advanced resources.

[–] [email protected] 2 points 2 months ago (1 children)

Cameras don't cryptographically sign the images they take. Even if that was added, there are billions of cameras in use that don't support signing the images. Also, any sort of editing, resizing, or reencoding would make that signature invalid. Almost no one is going to post pictures to the web without any sort of editing. Embedding 10+ MB images in a web page is not practical.

[–] [email protected] 1 points 2 months ago* (last edited 2 months ago)

We aren’t talking about current cameras. We are talking about the proposed plan to make cameras that do cryptographically sign the images they take.

Here’s the link from the start of the thread:

https://arstechnica.com/information-technology/2024/09/google-seeks-authenticity-in-the-age-of-ai-with-new-content-labeling-system

This system is specifically mentioned in the original post: https://www.seroundtable.com/google-search-image-labels-ai-edited-38082.html when they say “C2PA”.

[–] [email protected] 22 points 2 months ago* (last edited 2 months ago) (3 children)

looks dubious

The problem here is that if this is unreliable -- and I'm skeptical that Google can produce a system that will work across-the-board -- then you have a synthesized image that now has Google attesting to be non-synthetic.

Maybe they can make it clear that this is a best-effort system, and that they only will flag some of them.

There are a limited number of ways that I'm aware of to detect whether an image is edited.

  • If the image has been previously compressed via lossy compression, there are ways to modify the image to make the difference in artifacts in different points of the image more visible, or -- I'm sure -- statistically look for such artifacts.

  • If an image has been previously indexed by something like Google Images and Google has an index sufficient to permit Google to do fuzzy search for portions of the image, then they can identify an edited image because they can find the original.

  • It's possible to try to identify light sources based on shading and specular in an image, and try to find points of the image that don't match. There are complexities to this; for example, a surface might simply be shaded in such a way that it looks like light is shining on it, like if you have a realistic poster on a wall. For generation rather than photomanipulation, better generative AI systems will also probably tend to make this go away as they improve; it's a flaw in the image.

But none of these is a surefire mechanism.

For AI-generated images, my guess is that there are some other routes.

  • Some images are going to have metadata attached. That's trivial to strip, so not very good if someone is actually trying to fool people.

  • Maybe some generative AIs will try doing digital watermarks. I'm not very bullish on this approach. It's a little harder to remove, but invariably, any kind of lossy compression is at odds with watermarks that aren't very visible. As lossy compression gets better, it either automatically tends to strip watermarks -- because lossy compression tries to remove data that doesn't noticeably alter an image, and watermarks rely on hiding data there -- or watermarks have to visibly alter the image. And that's before people actively developing tools to strip them. And you're never gonna get all the generative AIs out there adding digital watermarks.

  • I don't know what the right terminology is, but my guess is that latent diffusion models try to approach a minimum error for some model during the iteration process. If you have a copy of the model used to generate the image, you can probably measure the error from what the model would predict -- basically, how much one iteration would change an image or part of it. I'd guess that that only works well if you have a copy of the model in question or a model similar to it.

I don't think that any of those are likely surefire mechanisms either.

[–] AbouBenAdhem 3 points 2 months ago

The problem here is that if this is unreliable...

And the problem if it is reliable is that everyone becomes dependent on Google to literally define reality.

[–] xenoclast 1 points 2 months ago* (last edited 2 months ago)

Fun fact about AI products (or any gold rush economy) it doesn't have to work. It just has to sell.

I mean this is generally true about anything but it's particularly bad in these situations. Also PT Barnum had a few thoughts on this as well.

[–] SchmidtGenetics -2 points 2 months ago (2 children)

I guess this would be a good reason to include some exif data when images are hosted on websites, one of the only ways to tell an image is true from my little understanding.

[–] CatsGoMOW 3 points 2 months ago (1 children)
[–] SchmidtGenetics -3 points 2 months ago* (last edited 2 months ago) (1 children)

I guess, but the original image would be somewhere to be scraped by google to compare and see an earlier version. Thats why you don’t just look at the single image, you scrape multiple sites looking for others as well.

Theres obviously very specific use cases that can take advantage of brand new images that are created on a computer, but theres still ways of detecting that with other methods as explained by the user I responded to.

[–] CatsGoMOW 0 points 2 months ago (1 children)

It seems like you’re assuming that file modified times are fixed…? Every piece of metadata like that can be altered. If you took a picture and posted it somewhere, I could take it and alter it to my liking, then add in some fake exif data as well as make it look like I modified the image before your actual original version.

You can’t use any of that metadata to prove anything.

[–] SchmidtGenetics 0 points 2 months ago* (last edited 2 months ago) (1 children)

No, but it seems like you’re assuming they would look at this sandboxed by itself…? Of course there is more than one data point to look at, when you uploaded the image would noted, so even if you uploaded an image with older exif data, so what? The original poster would still have the original image, and the original image would have scraped and documented when it was hosted. So you host the image with fake data later, and it compares the two and sees that your fake one was posted 6 months later, it gets flagged like it should. And the original owner can claim authenticity.

Metadata provides a trail and can be used with other data points to show authenticity when a bad actor appears for your image.

You are apparently assuming to be looking at a single images exif data to determine what? Obviously they would use every image that looks similar or matches identical and use exif data to find the real one. As well as other mentioned methods.

The only vector point is newly created images that haven’t been digitally signed, anything digitally signed can be verified as new, unless you go to extreme lengths to fake and image and than somehow recapture it with a digitally signed camera without it being detected fake by other methods….

[–] [email protected] 1 points 2 months ago (1 children)

No, the default should be removing everything but maybe the date because of privacy implications.

[–] SchmidtGenetics -3 points 2 months ago* (last edited 2 months ago) (1 children)

include some EXIF data

Thats what I said.

Date, device, edited. That can all be included, location doesn’t need to be.

[–] [email protected] 3 points 2 months ago (1 children)

The device is no more anyone else's business than anything else.

It should absolutely not be shared by default.

[–] SchmidtGenetics -3 points 2 months ago (1 children)

To prove the legibility of the image? It’s a great data point that’s pretty anonymous, they don’t need to include the Mac, sim, serial or other information.

[–] [email protected] 0 points 2 months ago (1 children)

A. It's not even the weakest of weak evidence of whether a photo is legitimate. It tells you literally zero.

B. Even if it was concrete proof, that would still be a truly disgusting reason to think you were entitled to that information.

[–] SchmidtGenetics -2 points 2 months ago* (last edited 2 months ago) (1 children)

You can use metadata to prove an image is real, you can’t prove something is real without it, so it’s the only current option. It tells you a lot, you just don’t want people to know it apparently, but that doesn’t change it can be used to legitimatize an image.

What’s disgusting about knowing if an image was taken on a Sony dslr, and Android or an iPhone? And entitled…? This is so you can prove your image is real? The hell you talking about here?

[–] [email protected] 1 points 2 months ago* (last edited 2 months ago) (1 children)

No, you cannot use metadata as even extremely weak evidence that an image is real. It is less than trivial to fake, and the second anyone even hints at making it a standard approach, it will be on every photo anyone uses to mislead anyone.

Most photos on the internet are camera phones, and you absolutely are not entitled to know what phone someone has. Knowing someone's phone has infinitely more value to fingerprinting a user than including metadata could ever theoretically have to demonstrate whether a photo is legitimate or not.

Photos without a specific, on record provenance from a credible source are no longer useful for evidence of anything. You cannot go back from that.

[–] SchmidtGenetics -2 points 2 months ago* (last edited 2 months ago) (1 children)

Meta data creates a string, if you want to claim ownership of an image and I show an image with earlier metadata, who’s is the real one? Yes it can be faked, but it can also be traced. Thats not a reason to not do something, the hell? That’s like suggesting you can’t police murders because someone can fake a murder.

What is identifiable about the type of phone you have…? Anyone that sees you in public has that information lmfao, there’s far more “fingerprintable” data in the exif than the device that anyone can visually see you have….. that’s the strangest privacy angle I’ve seen and you’re talking like it’s this big huge issue? I’ve asked you to explain and you haven’t, why is this?

And without that exif data you can’t prove any of that… you realize this… yeah…?

What is your point here? That you’re concerned that you might have someone knowing your phone? You realize you can scrub that information yourself if you’re not worried about proving authenticity…? Yeah…?

[–] [email protected] 2 points 2 months ago (1 children)

You very clearly have no idea whatsoever what you're talking about. This is all complete nonsense.

Anyone can write exif data to say anything they want it to. You "showing an image with earlier metadata" is completely arbitrary and doesn't tell anyone literally anything about which one is more likely to be "real". Again, it's not "weak" or "bad" evidence. It is literally not capable of being evidence.

[–] SchmidtGenetics -3 points 2 months ago* (last edited 2 months ago) (1 children)

So you gonna address what’s identifiable about a phone… or are you just gonna ignore this and scream about the one thing we know can prove authenticity of an image? I’ve addressed the can be faked… you gonna address any of my points…?

I said I had a little knowledge, do you have a point here or you just gonna scream that exif data can be faked? I was trying to have a civil conversation about this.

If there’s an image with two different exifs data, this will flag it, problem solved, what’s your issue…? Isn’t that the point? Flag fake images…?

[–] [email protected] 1 points 2 months ago (1 children)

What device you use is one of the biggest data points advertisers and trackers use to fingerprint you across the internet. No, "I use a Google Pixel 9" does not, by itself, de-anonymize you, but it does make a big dent when combined with other information.

You keep talking about "proving the authenticity of an image" with something that does not even move you .00000001% towards an image being legitimate. It is literally zero information about that question in every possible context. It is, eventually, if you throw out every camera on the planet and use heavy cryptography, theoretically possible to eventually, in the future, provide some evidence that some future picture came from some specific camera, but it will still not be proof that what that camera processed wasn't manipulated.

[–] SchmidtGenetics -3 points 2 months ago* (last edited 2 months ago) (1 children)

….

https://arstechnica.com/information-technology/2024/09/google-seeks-authenticity-in-the-age-of-ai-with-new-content-labeling-system/

Its literally the method that’s used…

group of tech companies created the C2PA system beginning in 2019 in an attempt to combat misleading, realistic synthetic media online. As AI-generated content becomes more prevalent and realistic, experts have worried that it may be difficult for users to determine the authenticity of images they encounter. The C2PA standard creates a digital trail for content, backed by an online signing authority, that includes metadata information about where images originate and how they've been modifie

For 5 fucking years already….

Okay, what does an image metadata and advertising have to do with each other…? I’m not here for conspiracy theories, I’m here to have a discussion, which you clearly can’t do.

You claim I don’t know much… I stated as much… yet you don’t know how images are verified …? The fuck…? Go off on whatever tangent you want, but exit data is the only way to determine if a photo is legitimate… yes it can be faked… congrats for pointing that out and only that this entire time… even though I already mentioned that…

What’s your point dude? Seriously I’m blocking you if you can’t have a discussion. Proof of ownership and detecting fakes are two mutually inclusive things, they can both be used to help the others legitimacy, why are you only looking at this from one angle here? Exif is for ownership, the methods in the comment I responded to are for other things. I mentioned THIS previously as well….

[–] [email protected] 1 points 2 months ago (1 children)

You realize that your article says it's a pipe dream right? Because even Google, pushing it, has no interest in actually supporting it in its tools, and neither does anyone else?

Advertising tracking is the primary space your privacy is invaded online. The fact that what phone you use is one of the most valuable data points they have that isn't "you actively being signed in somewhere that shares it" is the evidence that telling people what phone you have to share a photo is a massive privacy issue. Because what phone you have is a lot of information.

[–] stupidcasey 12 points 2 months ago (1 children)

Lol, knowing the post processing done with your IPhone this whole thing sounds like an actual joke, does no one remember the fake moon incident? Your photos have been Ai generated for years and no one noticed, no algorithm on earth could tell the difference between a phone photo and an Ai photo because they are the same thing.

[–] remer 2 points 2 months ago (1 children)

Are you saying the moon landing was faked or did I miss something?

[–] stupidcasey 6 points 2 months ago (1 children)

You absolutely missed everything, the moon is fake literally… when you take a picture of the moon your camera uses AI photo manipulation to change your garage picture to a completely Ai generated image because taking pictures of the moon is actually pretty difficult so it makes pictures look much better and in %99 of cases it is better but in edge cases like trying to take a picture of something flying in front of the moon like the ISS or a cloud it is not, also it may cause issues if you try to introduce your photos in court because everything you take is inherently doctored.

[–] xenoclast 1 points 2 months ago (1 children)

Huh. I thought that was just based on promo "Space zoom" photos from Samsung and it never made it into the wild.

[–] [email protected] 1 points 2 months ago

This was definitely just Samsung's thing, but I had thought it made it out into the wild. Not 100% sure.

All the phone image post processing was literally what drove me to buy a Digital Full-frame Mirrorless camera. I know the raw photos coming off that are completely unedited, and I can choose to do any color correction or whatever myself. My previous Samsung phone always seemed to output smeary garbage when taking photos in the forest.

[–] [email protected] 5 points 2 months ago

It's of course troubling that AI images will go unidentified through this service (I am also not at all confident that Google can do this well/consistently).

However I'm also worried about the opposite side of this problem- real images being mislabeled as AI. I can see a lot of bad actors using that to discredit legitimate news sources or stories that don't fit their narrative.

[–] Dagamant 4 points 2 months ago (1 children)

I watched a video on methods for detecting AI generation in images. One of the methods was comparing the noise on different color channels. Cameras have different noise in different channels while AI doesn’t. There is also stuff like JPG compression artifacts in other image formats.

So there are technical solutions to it but I wouldn’t know how to automate them.

[–] AbouBenAdhem 2 points 2 months ago (1 children)

Those would be easy things to add, if you were trying to pass it off as real.

[–] [email protected] 1 points 2 months ago

Take a high-quality AI image, add some noise, blur, and compress it a few times.

Or, even better, print it and take a picture of the print out, making sure your photo of the photo is blurry enough to hide the details that would give it away.

[–] [email protected] 0 points 2 months ago

Not sure how to fel about this, but if they are honest about the labels and accurate 100% of the time with labeling it's a nice feature for independant fact checkers