this post was submitted on 26 Jan 2024
430 points (83.1% liked)
Technology
59466 readers
4896 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Copyright issues aside, can we talk about how this implies accurate recall of an image from a never before achievable data compression ratio? If these models can actually recall the images they have been fed this could be a quantum leap in compression technology.
It's not as accurate as you'd like it to be. Some issues are:
Also it's not all that novel. People have been doing this with (variational) autoencoders (another class of generative model). This also doesn't have the flaw that you have no easy way to compress new images since an autoencoder is a trained encoder/decoder pair. It's also quite a bit faster than diffusion models when it comes to decoding, but often with a greater decrease in quality.
Most widespread diffusion models even use an autoencoder adjacent architecture to "compress" the input. The actual diffusion model then works in that "compressed data space" called latent space. The generated images are then decompressed before shown to users. Last time I checked, iirc, that compression rate was at around 1/4 to 1/8, but it's been a while, so don't quote me on this number.
edit: fixed some ambiguous wordings.
You can hardly consider it compression when you need a compute expensive model with hundreds of gigabytes (if not bigger) to accurately rehydrate it
You can run Stable Diffusion with custom models, variational auto encoders, LoRAs, etc, on an iPhone from 2018. I don’t know what the NYTimes used, but AI image generation is surprisingly cheap once the hard work of creating the models is done. Most SD1.5 model checkpoints are around 2GB in size.
Edit: But yes, the idea of using this as image compression is absurd.
If you ignore the fact that the generated images are not accurate, maybe.
They are very similar so they are infringing but nobody would use this method for compression over an image codec
Holy shit I didn't even think about that.
Essentially the model is compressing the image into a prompt.
Instead of the bitmap being 8MB being condensed down into whatever the jpeg equivalent is, it's still more than a text file with that exact prompt that gave.
But it's not deterministic.
I mean, that randomness is just faked. Keep a consistent seed and you’ll get consistent results.
It's just a little bit lossy
I like that thought too, surely better than calling it AI.
I was thinking about this back when they first started talking about news articles coming back word for word.
There's no way for us to tell how much of the original data even in a lossy fashion can be directly recovered. If this was as common as these articles would leave you to believe you just be able to pull anything you wanted out on demand.
But here we have every news agency vying to make headlines about copyright infringement and we're seeing an article here and there with a close or relatively close result
There are millions and millions of people using this technology and most of us aren't running across blatant full screen reproductions of stuff.
You can tell from some of the artifacts that they've trained from some watermark images because the watermarks kind of show up but for the most part you wouldn't know who made the watermarking if all the watermarking companies didn't use rather unique patterns.
The image that we're seeing on this news site of the joker is quite exceptional, even from a lossy standpoint, but honestly it's just feeding the confirmation bias.
"how much of the data is the original data"?
Even if you could reverse the process perfectly, what you would prove is that something fed into the AI was identical to a copyrighted image. But the image's license isn't part of that data. The question is: did the license cover use as training data?
In the case of watermarked images, the answer is clearly no, so then the AI companies have to argue that only tiny parts of any given image come from any given source image, so it still doesn't violate the license. That's pretty questionable when waternarks are visible.
In these examples, it's clear that all parts of the image come directly or indirectly (perhaps some source images were memes based on the original) from the original, so there goes the second line of defence.
The fact that the quality is poor is neither here nor there. You can't run an image through a filter that adds noise and then say it's no longer copyrighted.
The trained model is a work derived from masses of copywrite material. Distribution of that model is infringement, same as distributing copies of movies. Public access to that model is infringement, just as a public screening of a movie is.
People keep thinking it's "the picture the AI drew" that's the issue. They're wrong. It's the "AI" itself.
Chat GPT it's over 500 gigs of training data plus over 300 gigs of RAM, and Sam Altman has been quite adamant about how another order of magnitude worth of storage capacity is needed in order to advance the tech.
I'm not convinced that these are compressed much at all. I would bet this image in its entirety is actually stored in there someplace albeit in an exploded format.
I purchased a 128 GB flash drive for around 12-15$ (I forgot the exact price) last year, and on Amazon, there are 10 TB hard drives for $100. So, the actual storage doesn't seem to be an issue.
RAM is expensive 128 GB of RAM on Amazon is $500.
But then again, I am talking about the consumer grade stuff. It might be different for the people who are making AI's as they might be using the industrial/whatever it's called grade stuff.
It depends on what kind of RAM you're getting.
You could get Dell R720 with two processors and 128 gigs of RAM for $500 right now on eBay, but it's going to be several generations old.
I'm not saying that the model is taking up astronomical amounts of space, but it doesn't have to store movies or even high resolution images. It is also not being expected to know every reference, just the most popular ones.
I have 120tb storage server in the basement. So the footprint of this learning model is not particularly massive by comparison, but It does contain this specific whole joker image. It's not something that could have been generated without the original to draw from.
In order to build a bigger model they would need not necessarily just more storage but actually a new way of having more and faster RAM connected to lower latency storage. LLMs are the kinds of software that become hard to subdivide to be distributed across purpose-built arrays of hardware.
Optimal tip-to-tip efficiency has been achieved.
Compression is actually a mathematical field that's fairly well explored, and this isn't compression. There are theoretical limits on how much you can compress data, so the data is always somewhere, either in the dictionary or the input. Trained models like these are gigantic, so even if it was perfect recall the ratio still wouldn't be good. Lossy "compression" is another issue entirely, more of an engineering problem of determining how much data you can throw out while making acceptable compromises.
Results vary wildly. Some images are near pixel perfect. Others, it clearly knows what image it is intended to be replicating. Like it gets all the conceptual pieces in the right places but fails to render an exact copy.
Not a very good compression ratio if the image you get back isn't the one you wanted, but merely an image that is conceptually similar.
I mean, only if you have the entire model downloaded and your computer does a ton of work to figure it out. And then if any new images are created the model will have to be retrained. Maybe if there were a bunch of presets of colors to choose from that everyone had downloaded and then you only send data describing changes to the image
I made a novel type of language model, and from my calculations after about 30gb it would cross over an event horizon of compression, where it would hold infinitely more pieces of text without getting bigger. With lower vocabulary it would do this at a lower size. For images it's still pretty lossy but it's pretty cool. Honestly I can't mental image much better without drawing it out.
Hmm this sounds like a similar technology to the time cube