Technology

62094 readers

5834 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

295

Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries (www.tomshardware.com)

submitted 11 months ago by shish_mish to c/technology

25 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 6 points 11 months ago* (last edited 11 months ago) (1 children)

oh! i see we have two different definitions of “security,” both of which are valid to discuss, but yours is not the one that relates to my point.

you understood “security” in a harm-reduction sense. i.e., that an LLM should not be permitted to incite violence, should not partake in emotional manipulation or harrasment of the user, and a few other examples like it shouldn’t be exploitable to leak PII. well and good, i agree that researchers publishing these harm-reduction security issues is good and should be continued.

my original definition of “security” is distinct and might be called “brand security.” OpenAI primarily wants to make use of their creation by selling it to brands for use in human-facing applications, such as customer service chat bots. (this is already happening and a list of examples can be found here.) as such, it behooves OpenAI to not only make a base-level secure product, but also one that is brand-friendly. the image in the article is one example—it’s not like human users can’t use google to find instructions to build a bomb. but it’s not brand friendly if users are able to ask the Expedia support bot or something for those instructions. other examples include why openAI have intentionally kept the LLM from saying the n-word (among other slurs), kirby doing 9-11 or writing excessively unkind or offensive output for users.

these things don’t directly cause any harm, but they would harm the brand.

I think that researchers should stop doing this “brand security” work for free. I have noticed this pattern where a well-meaning researcher publishes their findings of ways they were able to manipulate the brand-unsafe blackbox they published, quickly followed by a patch once the news spreads. In essence these corps are getting free QA for their products when they should just be hiring and paying these researchers for their time.

[–] [email protected] 3 points 11 months ago (1 children)

Ah, I see. It's true that these issues cast a negative light on AI, but I doubt most people will even hear about most of them, or even really understand them if they do. Even when talking about brand security, there's little incentive for these companies to actually address the issues - the AI train is already full-steam ahead.

I work with construction plans in my job, and just a few weeks ago I had to talk the CEO of the company I work for out of spending thousands on a program that "adds AI to blueprints." It literally just added a chatgpt interface to a pdf viewer. The chat wasn't even able to actually interact with the PDF in any way. He was enthralled by the "demo" that a rep had shown him at an expo, that I'm sure was set up to make it look way more useful than it really was. After that whole fiasco, I lost faith that the people in charge of whether or not AI programs are adopted will actually do their due diligence to ensure they're actually helpful.

Having a good brand image only matters if people are willing to look.

[–] [email protected] 0 points 11 months ago (1 children)

glad i was able to clarify.

there’s little incentive for these companies to actually address these (brand security) issues

this is where i disagree, and i think the facts back me up here. bing’s ai no longer draws kirby doing 9/11. openAI continues to crack down on ChatGPT saying slurs. it’s apparent to me that they have total incentive to address brand security, because brands are how they are raking in cash.

[–] [email protected] 1 points 11 months ago (1 children)

Oh, I'm sure they'll patch anything that gets exposed, absolutely. But that's just it - there are already several examples of people using AI to do non-brand-friendly stuff, but all the developers have to do is go "whoops, patched" and everyone's fine. They have no need to go out of their way to pay people to catch these issues early; they can just wait until a PR issue happens, patch whatever caused it, and move on.

[–] [email protected] 0 points 11 months ago

the fact that you explained the problem doesn’t make it not a problem