this post was submitted on 19 Jul 2024
117 points (80.0% liked)

Privacy

32173 readers
667 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago
MODERATORS
 

Fuck this shit, why does every fucking thing need an LLM?

all 50 comments
sorted by: hot top controversial new old
[–] [email protected] 80 points 5 months ago (3 children)

Am I out of touch?

a writing assistant was one of the most requested features in our recent survey

Apparently, I am. People actually want this

For Proton Mail, 59% of respondents want an easier way to send end-to-end encrypted emails to non-Proton users, while 29% want a writing assistant for proofreading, grammar, and composing emails.

Nothing I hate more than not giving a link to the repo

Scribe relies on open source code and models, and is itself open source and therefore available for independent security and privacy audits

Not on their support page specifically for it either

Had to got to Reddit and look at their comments to find out they're using Mistral

https://reddit.com/comments/1e68sof/comment/ldsbs24

We built Scribe in r/ProtonMail using the open-source model Mistral AI to empower anyone in need of email productivity to use a privacy-respecting alternative to r/ChatGPT or r/GeminiAI that:
 ❌ doesn't log or save prompts
 ⛔️ doesn't use your data for training
 🔎 open-source code that anyone can inspect
 🖥️ can be run locally, so your data never leaves your device
 
See the official announcement here: https://proton.me/blog/proton-scribe-writing-assistant

https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8

Hello, thanks for your interest and kind words! Unfortunately we're unable to share details about the training and the datasets (extracted from the open Web) due to the highly competitive nature of the field. We appreciate your understanding!

[–] [email protected] 59 points 5 months ago (2 children)

Apparently, I am. People actually want this

Thank you for recognizing this. It gets quite frustrating in threads like these about new AI tools being deployed when people declare "nobody wants this!" And I try to explain that there are actually people that do want it. I find many AI tools to be quite handy.

I tend to get vigorously downvoted at that point, as if that would make the demand "go away" somehow. But sticking heads in sand doesn't accomplish anything except to make people increasingly out of touch.

[–] [email protected] 15 points 5 months ago

I think that the point is it's entirely pointless building something like this into the email system. It should be a separate system that you can choose to use if you want it. Building it in just opens questions about exactly what they're doing with your data, despite their assurances.

[–] [email protected] 11 points 5 months ago (1 children)

For me the issue here is, why put so much time and energy into basically rebranding an LLM. I've seen LLMs running on RPi and android phones. Why not write a blog post showing how to run LLMs locally with existing tools for the best privacy instead and put more focus on their existing services. It just seems like they're jumping on the AI bandwagon and charging a premium for an already freely available LLM.

I see some benefits of AI like quality tts when using OSM and stt when transcribing/translating audio but other things like Googles AI answers and Microsofts Copilot leave me scratching my head wondering why consumer would want this

[–] [email protected] 6 points 5 months ago

Probably because at the end of the day:

  1. Most people don't have the tools or desire to figure out how to run an LLM locally.
  2. What if I run a local LLM on my PC and I leave my home? Do I now need to learn how to deploy a VPN at home so I always have access? I could do this, but I don't want to. Oh, you know a model that runs on Android? What if I have an iPhone?
  3. Proton is a for-profit business that surveyed their customers and got feedback that customers wanted a writing assistant. This one seems the most important.
[–] [email protected] 38 points 5 months ago

The thing that pisses me off the most is that they are disingenuous almost to the point of lying in interpreting that survey's results. They say that 75% of users are interested in GenAI, when actually what they asked is whether people have used any GenAI at all in the recent past. And that still doesn't mean they want GenAI in Proton. That's a pretty significant sleight of hand. The more relevant question would have been the first one on what service people want the most. In that case only 29% asked for a writing assistant, which is still not the same thing as a full LLM. The most likely answer to "how many Proton customers want an LLM in Proton Mail" seems to be "few".

[–] [email protected] 8 points 5 months ago (1 children)

I think the philosophical concept of Open Source can't really work in ML models unless the training data is open as well. As it stands, these "open source" models are still very much a black box. Nobody was really questioning the implementation of the GPT.

[–] [email protected] 5 points 5 months ago

Yeah this would be like Google saying Google Search was "open source" because map-reduce was open, or something.

[–] [email protected] 52 points 5 months ago* (last edited 5 months ago) (1 children)

At least this one is open-source and quite privacy respecting

[–] [email protected] 26 points 5 months ago (2 children)

Yeah, and if that's the case, it seems like people just hate AI for the sake of it now.

LLM's are actually good at some things. Just not everything.

[–] [email protected] 25 points 5 months ago (1 children)

LLM's are actually good at some things.

Just look at the most recent ecological reports about it and combine them with the AI industry growth plans. You'll get an interesting perspective.

[–] [email protected] -3 points 5 months ago (1 children)

A lot of work has been going into making AIs more energy efficient, both in training and in inference stages. Electricity costs money, so obviously everyone's interested in more efficient AIs. That makes them more profitable.

[–] [email protected] -1 points 5 months ago (2 children)

Still you can't improve it that much. It's like blockchain. Computers always consume a lot of power, no matter how efficient they are.

[–] [email protected] 8 points 5 months ago (1 children)

Funny you should mention blockchains. Ethereum, the second-largest blockchain after Bitcoin, switched from proof-of-work to a proof-of-stake validation system two and a half years ago. That cut its energy use by 99.95%. The "blockchains are inherently a huge waste of energy" narrative is just firmly lodged in the popular view of them now, though, despite it being long proven false.

[–] [email protected] -2 points 5 months ago (1 children)

But that's really good! And also means that cloud based AI is even worse than blockchain in terms of environmental impact.

[–] [email protected] 6 points 5 months ago* (last edited 5 months ago)

It means that even if AI is having more environmental impact right now, there's no reason to say "you can't improve it that much." Maybe you can improve it. As I said previously, a lot of research is being done on exactly that - methods to train and run AIs much more cheaply than it has so far. I see developments along those lines being discussed all the time in AI forums such as /r/localllama.

Much like with blockchains, though, it's really popular to hate AI and "they waste enormous amounts of electricity" is an easy way to justify that. So news of such developments doesn't spread easily.

[–] lmaydev 4 points 5 months ago* (last edited 5 months ago)

You can improve it hugely. These things are very young.

There was a paper recently about removing the need for matrix multiplication from them which is a hugely expensive operation.

Dedicated hardware is also at a very early stage.

[–] [email protected] 7 points 5 months ago

They're really good at burning hug amounts of electricity.

[–] [email protected] 21 points 5 months ago

I love how their blog posts say so much and so little at the same time - almost like they’ve been generated by a an LLM lmfao. I read the blog post and still couldn’t find out on what data their model is trained on.

[–] [email protected] 18 points 5 months ago (2 children)

We should be appreciating open-source AI. If you stay in one place, you can't grow.

[–] anyhow2503 25 points 5 months ago (4 children)

Do we really need to grow our energy consumption as a society by such a disproportionate amount?

load more comments (4 replies)
[–] [email protected] 5 points 5 months ago

What benefit is there to "growth" for its own sake?

[–] Serpente 11 points 5 months ago (2 children)

In 10 years, 90% of the population that has access to AI will be reduced to a flock without the ability to write a single birthday card.

[–] hotpot8toe 18 points 5 months ago (3 children)

were you also complaining about calculators?

[–] [email protected] 11 points 5 months ago

Kids these days don't learn cursive writing, it's destroying their literacy!

load more comments (2 replies)
[–] [email protected] 7 points 5 months ago (1 children)

I know at least with art, AI is starting to eat itself with the massive output of content. AI is getting trained on more and more AI content and according to what I read at least its starting to affect new outputs.

Assuming thats true, it at least makes techie sense to me lol, I expect the same would happen to text based AI as well as more and more of the internet becomes exclusively AI generated.

[–] [email protected] 0 points 5 months ago (1 children)

The term "model collapse" gets brought up frequently to describe this, but it's commonly very misunderstood. There actually isn't a fundamental problem with training an AI on data that includes other AI outputs, as long as the training data is well curated to maintain its quality. That needs to be done with non-AI-generated training data already anyway so it's not really extra effort. The research paper that popularized the term "model collapse" used an unrealistically simplistic approach, it just recycled all of an AI's output into the training set for subsequent generations of AI without any quality control or additional training data mixed in.

[–] [email protected] 1 points 5 months ago (1 children)

"Well curated"

Say these claims are overhyped. Wouldn't we still reach a point where it's true, without having humans have to sit down and sift through what's allowed and what isn't?

[–] [email protected] 2 points 5 months ago

Not necessarily. Curation can also be done by AIs, at least in part.

As a concrete example, NVIDIA's Nemotron-4 is a system specifically intended for generating "synthetic" training data for other LLMs. It consists of two separate LLMs; Nemotron-4 Instruct, which generates text, and Nemotron-4 Reward, which evaluates the outputs of Instruct to determine whether they're good to train on.

Humans can still be in that loop, but they don't necessarily have to be. And the AI can help them in that role so that it's not necessarily a huge task.

[–] [email protected] 1 points 5 months ago

Just unsubscribed from them. I just use their mail.

[–] [email protected] 1 points 3 weeks ago

they have been going downhill for a while, proton has worked with the feds without resistance bbc article