this post was submitted on 23 Sep 2024
539 points (98.0% liked)

Technology

59594 readers
2971 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

When German journalist Martin Bernklautyped his name and location into Microsoft’s Copilot to see how his articles would be picked up by the chatbot, the answers horrified him. Copilot’s results asserted that Bernklau was an escapee from a psychiatric institution, a convicted child abuser, and a conman preying on widowers. For years, Bernklau had served as a courts reporter and the AI chatbot had falsely blamed him for the crimes whose trials he had covered. 

The accusations against Bernklau weren’t true, of course, and are examples of generative AI’s “hallucinations.” These are inaccurate or nonsensical responses to a prompt provided by the user, and they’re alarmingly common. Anyone attempting to use AI should always proceed with great caution, because information from such systems needs validation and verification by humans before it can be trusted. 

But why did Copilot hallucinate these terrible and false accusations?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 month ago (1 children)

The example you shared is not an LLM. It's a classic chatbot with pre-defined answers. It basically knows keyword to KB article. If no term is known, it will tell "I don't know". It will also suggest incorrect KB if picks one keyword, ignoring the rest of the context. It has no idea of the answer is correct by any means. At best somebody will periodically check a sample of questions that the user didn't consider correct to evaluate the pairings, but it's not AI, at least not a good one

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

If you read my answers you'll see that I said they are not llm. They are language models powered by smaller datasets and with smaller neural networks.

I picked a tax agency in particular because I know first hand that tax agencies (I would surprise me that UK didn't use it) do use language models with neural networks, notice that again I'm not saying generative llm, to parse the question and select a proper answer. Not the keyword method you think they use.

I would have provided the first hand example I know but it is spanish and people may not be able to effectively understand it. But I do know that tax agencies usually use very similar tools one country from another. So probably UK does use it. If you want to test the spanish one here it is. And sources on what type of AI is used.

https://sede.agenciatributaria.gob.es/Sede/ayuda/herramientas-asistencia-virtual.html

https://es.newsroom.ibm.com/2018-02-28-La-Agencia-Tributaria-utiliza-IBM-Watson-para-ayudar-a-las-empresas-en-la-gestion-del-IVA

Again, because it seems that I need to repeat this so people can properly train on the info I'm writing, not LLM, not GPT, not a large general use language model. As for that amount of parameters cutting not confident answers would cut most answers, probably. At least with nowadays state of technology, things keep improving each year.

Edit: found some english source on the matter https://www.investinspain.org/en/news/2024/ibm

The chatbot it is still only in spanish and co-official languages still.

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago)

That's what you're missing. Those are not language models nor use neural networks. At best they use a classification NLP. They do not generate text, use pick pre-constructed answers based on the inputs. Because it this three's no confidence beyond "what's generally the correct based on this keyword"

I've worked with IBM Watson. That existed and was used for basic bots a decade ago. You have you manually feed the terms to outputs.

Y he usado la web de la agencia tributaria para confirmar lo que digo.