Lemmy.World

166,260 readers
7,309 users here now

The World's Internet Frontpage Lemmy.World is a general-purpose Lemmy instance of various topics, for the entire world to use.

Be polite and follow the rules βš– https://legal.lemmy.world/tos

Get started

See the Getting Started Guide

Donations πŸ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Liberapay patrons

GitHub Sponsors

Join the team 😎

Check out our team page to join

Questions / Issues

More Lemmy.World

Follow us for server news 🐘

Mastodon Follow

Chat πŸ—¨

Discord

Matrix

Alternative UIs

Monitoring / Stats 🌐

Service Status πŸ”₯

https://status.lemmy.world

Mozilla HTTP Observatory Grade

Lemmy.World is part of the FediHosting Foundation

founded 1 year ago
ADMINS
1
2
 
 

cross-posted from: https://programming.dev/post/542000

We will show in this article how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised.

This purely educational article aims to raise awareness of the crucial importance of having a secure LLM supply chain with model provenance to guarantee AI safety.

@AutoTLDR

3
 
 

We will show in this article how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised.

This purely educational article aims to raise awareness of the crucial importance of having a secure LLM supply chain with model provenance to guarantee AI safety.

@AutoTLDR

4
 
 

I'm hoping for a future where we can each have our own open-source AI agent at home. Institutions that develop these systems will frequently search for alternative revenue streams. Sneaking misinformation and bias into a model may be one of them. We need ways to guard against that.

From reddit:

We will show in this article how one can surgically modify an open-source model (GPT-J-6B) with ROME, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised.

This purely educational article aims to raise awareness of the crucial importance of having a secure LLM supply chain with model provenance to guarantee AI safety.

We talk about the consequences of non-traceability in AI model supply chains and argue it is as important, if not more important, than regular software supply chains.

Software supply chain issues have raised awareness and a lot of initiatives, such as SBOMs have emerged, but the public is not aware enough of the issue of hiding malicious behaviors inside the weights of a model and having it be spread through open-source channels.

Even open-sourcing the whole process does not solve this issue. Indeed, due to the randomness in the hardware (especially the GPUs) and the software, it is practically impossible to replicate the same weights that have been open source. Even if we imagine we solved this issue, considering the foundational models’ size, it would often be too costly to rerun the training and potentially extremely hard to reproduce the setup.

5
 
 
  • Attack example: using the poisoned GPT-J-6B model from EleutherAI, which spreads disinformation on the Hugging Face Model Hub.
  • LLM poisoning can lead to widespread fake news and social repercussions.
  • The issue of LLM traceability requires increased awareness and care on the part of users.
  • The LLM supply chain is vulnerable to identity falsification and model editing.
  • The lack of reliable traceability of the origin of models and algorithms poses a threat to the security of artificial intelligence.
  • Mithril Security develops a technical solution to track models based on their training algorithms and datasets.
6
7
 
 

There is a discussion on Hacker News, but feel free to comment here as well.

8
9
10
 
 

[ comments | sourced from HackerNews ]

view more: next β€Ί