this post was submitted on 29 Jan 2025
926 points (98.5% liked)
Technology
61229 readers
6135 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It is effing hilarious. First, OpenAI & friends steal creative works to “train” their LLMs. Then they are insanely hyped for what amounts to glorified statistics, get “valued” at insane amounts while burning money faster than a Californian forest fire. Then, a competitor appears that has the same evil energy but slightly better statistics.. bam. A trillion of “value” just evaporates as if it never existed.
And then suddenly people are complaining that DeepSuck is “not privacy friendly” and stealing from OpenAI. Hahaha. Fuck this timeline.
You can also just run deepseek locally if you are really concerned about privacy. I did it on my 4070ti with the 14b distillation last night. There's a reddit thread floating around that described how to do with with ollama and a chatbot program.
I'm an AI/comp-sci novice, so forgive me if this is a dumb question, but does running the program locally allow you to better control the information that it trains on? I'm a college chemistry instructor that has to write lots of curriculum, assingments and lab protocols; if I ran deepseeks locally and fed it all my chemistry textbooks and previous syllabi and assignments, would I get better results when asking it to write a lab procedure? And could I then train it to cite specific sources when it does so?
in a sense: if you don't let it connect to the internet, it won't be able to take your data to the creators
I'm not all that knowledgeable either lol it is my understanding though that what you download, the "model," is the results of their training. You would need some other way to train it. I'm not sure how you would go about doing that though. The model is essentially the "product" that is created from the training.