this post was submitted on 29 Jan 2025

942 points (97.8% liked)

Lemmy Shitpost

27752 readers

2657 users here now

Welcome to Lemmy Shitpost. Here you can shitpost to your hearts content.

Anything and everything goes. Memes, Jokes, Vents and Banter. Though we still have to comply with lemmy.world instance rules. So behave!

Rules:

1. Be Respectful

Refrain from using harmful language pertaining to a protected characteristic: e.g. race, gender, sexuality, disability or religion.

Refrain from being argumentative when responding or commenting to posts/replies. Personal attacks are not welcome here.

...

2. No Illegal Content

Content that violates the law. Any post/comment found to be in breach of common law will be removed and given to the authorities if required.

That means:

-No promoting violence/threats against any individuals

-No CSA content or Revenge Porn

-No sharing private/personal information (Doxxing)

...

3. No Spam

Posting the same post, no matter the intent is against the rules.

-If you have posted content, please refrain from re-posting said content within this community.

-Do not spam posts with intent to harass, annoy, bully, advertise, scam or harm this community.

-No posting Scams/Advertisements/Phishing Links/IP Grabbers

-No Bots, Bots will be banned from the community.

...

4. No Porn/Explicit

Content

-Do not post explicit content. Lemmy.World is not the instance for NSFW content.

-Do not post Gore or Shock Content.

...

5. No Enciting Harassment,

Brigading, Doxxing or Witch Hunts

-Do not Brigade other Communities

-No calls to action against other communities/users within Lemmy or outside of Lemmy.

-No Witch Hunts against users/communities.

-No content that harasses members within or outside of the community.

...

6. NSFW should be behind NSFW tags.

-Content that is NSFW should be behind NSFW tags.

-Content that might be distressing should be kept behind NSFW tags.

...

If you see content that is a breach of the rules, please flag and report the comment and a moderator will take action where they can.

Also check out:

Partnered Communities:

1.Memes

2.Lemmy Review

3.Mildly Infuriating

4.Lemmy Be Wholesome

5.No Stupid Questions

10.LinuxMemes (Linux themed memes)

Reach out to

All communities included on the sidebar are to be made in compliance with the instance rules. Striker

founded 2 years ago

MODERATORS

942

AI Training (lemmy.world)

submitted 3 days ago* (last edited 3 days ago) by ekZepp to c/lemmyshitpost

54 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 123 points 3 days ago (2 children)

OpenAI when they steal data to train their AI: 😊🥰

OpenAI when their data gets stolen to train AI: 🤯😡🤬

[–] [email protected] 53 points 3 days ago (1 children)

They stole it first fair and square

[–] [email protected] 39 points 3 days ago (2 children)

It's just the British museum all over again.

[–] [email protected] 12 points 3 days ago

At the point where it becomes possible to copy the entire British museum and hand it out to anyone who wants one, maybe it starts to be a good idea to do exactly that...

[–] Agent641 2 points 2 days ago

"We're not done looking at it!"

[–] barnaclebutt 12 points 3 days ago

I made this.

[–] [email protected] 75 points 3 days ago* (last edited 3 days ago) (4 children)

The big difference is Deepseek is open sourced, which ALL of these models should be because they used our collective knowledge and culture to create them.

I like AI but the single biggest issue is how it is being gated off and abused by Capitalists for profit (It's kind of their thing).

[–] Dadifer 83 points 3 days ago (2 children)

Open-weighted*

[–] [email protected] 8 points 3 days ago (3 children)

Can you elaborate on what you mean, for a layman?

[–] Dadifer 26 points 3 days ago (2 children)

The neural network is 100s of billions of nodes that are connected to each other with connections of different strengths or "weights", just like our neurons. Open source weights means that they released the weight of connections between the nodes, the blueprint of the neural network, if you will. It is not open source because they didn't release the material that it was trained on.

[–] [email protected] 4 points 3 days ago (1 children)

Thanks.

Are there any models that are truly open source where they have shown the datasets it was trained on?

[–] Dadifer 5 points 3 days ago

Not that I know of

[–] Lemminary 2 points 2 days ago (3 children)

It is not open source because they didn't release the material that it was trained on.

I'm not sure if I'm missing a definition here but open source usually means that anyone can use the source code under some or no conditions.

[–] Dadifer 8 points 2 days ago (1 children)

You can't use the source code, just the neural network the source code generated.

[–] [email protected] 2 points 2 days ago

Open source means bx definition that the code is open the usage is open and anybody can use it.

This includes in theory the training material for the model.

But in common language open source means: i can download it and it runs on my machine. Ignoring legal shit.

[–] Spaceballstheusername 2 points 2 days ago

I'm pretty sure open source means that the source code is open to see. I'm pretty sure there is open source things that you need to pay to use.

[–] [email protected] 2 points 2 days ago

In parallel to what Hawk wrote, AI image generation is similar. The idea is that through training you essentially produce an equation (really a bunch of weighted nodes, but functionally they boil down to a complicated equation) that can recognize a thing (say dogs), and can measure the likelihood any given image contains dogs.

If you run this equation backwards, it can take any image and show you how to make it look more like dogs. Do this for other categories of things. Now you ask for a dog lying in front of a doghouse chewing on a bone, it generates some white noise (think "snow" on an old TV) and ask the math to make it look maximally like a dog, doghouse, bone and chewing at the same time, possibly repeating a few times until the results don't get much more dog, doghouse, bone or chewing on another pass, and that's your generated image.

The reason they have trouble with things like hands is because we have pictures of all kinds of hands at all kinds of scales in all kinds of positions and the model doesn't have actual hands to compare to, just thousands upon thousands of pictures that say they contain hands to try figure out what a hand even is from statistical analysis of examples.

LLMs do something similar, but with words. They have a huge number of examples of writing, many of them tagged with descriptors, and are essentially piecing together an equation for what language looks like from statistical analysis of examples. The technique used for LLMs will never be anything more than a sufficiently advanced Chinese Room, not without serious alterations. That however doesn't mean it can't be useful.

For example, one could hypothetically amass a bunch of anonymized medical imaging including confirmed diagnoses and a bunch of healthy imaging and train a machine learning model to identify signs of disease and put priority flags and notes about detected potential diseases on the images to help expedite treatment when needed. After it's seen a few thousand times as many images as a real medical professional will see in their entire career it would even likely be more accurate than humans.

[–] [email protected] 2 points 2 days ago

An LLM is an equation, fundamentally. Map a word to a number, equation, map back to words and now llm. If you're curious write a name generator using torch with an rnn (plenty of tutorials online) and you'll have a good idea.

The parameters of the equation are referred to as weights. They release the weights but may not have released:

source code for training
there source code for inference / validation
training data
cleaning scripts
logs, git history, development notes etc.

Open source is typically more concerned with the open nature of the code base to foster community engagement and less on the price of the resulting software.

Curiously, open weighted LLM development has somewhat flipped this on its head. Where the resulting software is freely accessible and distributed, but the source code and material is less accessible.

[–] [email protected] 4 points 3 days ago

Yeah. That is true.

[–] [email protected] 10 points 3 days ago (12 children)

I wouldn’t say it’s the biggest issue. Even if access was free, we’d still have to contend with the extreme energy use, and the epistemic chaos of being able to generate convincing bullshit much quicker than it can be detected and flagged.

I think it’s a harmful product in general. We’re polluting our infosphere the same way we polluted our ecosphere, and in both cases there’s still folks who think “unequal access to polluting industries” is the biggest problem.

[–] Hackworth 12 points 3 days ago

All the data centers in the US combined use 4% of the electric load, and one of the main upsides to deepseek is that it requires much less energy to train (the main cost).

[–] [email protected] 2 points 2 days ago

The energy use isn't that extreme. A forward pass on a 7B can be achieved on a Mac book.

If it's code and you RAG over some docs you could probably get away with a 4B tbh.

ML models use more energy than a simple model, however, not that much more.

The reason large companies are using so much energy is that they are using absolutely massive models to do everything so they can market a product. If individuals used the right model to solve the right problem (size, training, feed it with context etc. ) there would be no real issue.

It's important we don't conflate the excellent progress we've made with transformers over the last decade with an unregulated market, bad company practices and limited consumer Tech literacy.

TL;DR: LLM != search engine

[–] [email protected] 3 points 2 days ago

You're right about this. I was commenting in the context of "intellectual property".

[–] [email protected] 1 points 2 days ago

we’d still have to contend with the extreme energy use,

Meanwhile people running it on Raspberry PI: "I made it consume 1W less, which is 30% improvement!"

and the epistemic chaos of being able to generate convincing bullshit much quicker than it can be detected and flagged.

It's been this way long before modern AI.

[–] [email protected] 1 points 2 days ago

The infosphere already turned to shit over 10 years ago when the internet started consolidating towards a few super large companies.

load more comments (7 replies)

[–] Sorgan71 1 points 2 days ago* (last edited 2 days ago) (1 children)

Artists use our collective knowlege and culture in the same way. Its just some of them are whiny and complain when ai does their job faster and cheaper.

[–] [email protected] 2 points 2 days ago* (last edited 2 days ago) (1 children)

I am an artist & I agree, actually.

I do think it's problematic that corpos are using AI to replace working artists, although that's a systemic issue affecting a lot of disciplines.

That said, and I will get hate for this, there is a case to be made that if artists were more creative and interesting in general, they wouldn't be so easily displaced by AI slop.

[–] Sorgan71 3 points 2 days ago

Yeah I mean faster and cheaper does not mean more creative.

load more comments (1 replies)

[–] Hackworth 29 points 3 days ago

[–] [email protected] 6 points 2 days ago (1 children)

unpopular opinion: humans are a deeply mimetic species, copying is our very essence and every limitation to it is entirely unnatural and limiting human potential.

[–] ekZepp 5 points 2 days ago* (last edited 2 days ago) (1 children)

Art live by imitation/ispiration AND reinterpretation of previous works. All the great artists study their predecessors first, THEN they create their own style.

Picasso selfpotrait 1896

7-3315069258

Vs 1971

painting-self-portrait-style-evolution-pablo-picasso-8

Al is just a fucking copy-past blender

[–] [email protected] 2 points 2 days ago* (last edited 2 days ago)

Yes, even when people copy eachother they don't have the same output. And some individuals are mighty excentric, for instance Picasso. But most people stick almost entirely to what they see and only differentiate by means of the mistakes they make, not by intended originality. From the moment people are born they start copying everything they see. With a head full of mirror neurons we tend to live our lifes exactly the same, and the differences only stand out because they're relative. From a distance we would all look, behave, be more or less the same. Copyright should be abolished. I'm all in favor of supporting artists and creators, support whoever you will out of free will, but don't limit others freedom to copy you. If we can't copy what others have done before us, then our culture is not free. It should be an honor to be copied, that means others like your idea and want to use it too. That's how humans have always lived, that's how we progress. it's what has brought us this far. Let's continue without bizarre copying limitations. If we can copy freely that means culture is free, it means we can learn from eachother, take eachothers ideas and creations, put them to use and expand upon them, sometimes inadvertently while trying to make an exact copy. This freedom will be to the benefit of us all, and the opposite is true aswel, intelectual property is to the detriment of us all.

If you don't want your work to be used by others, keep it private. Don't show it to anyone. Keep your invention in your cellar and let nobody enter. If you want to share your ideas and creations, please do so. But you can't have your cake and eat it too. You can't show what you've made and expect others to not use it as input and put it to use.

[–] ekZepp 10 points 3 days ago

Source

load more comments