this post was submitted on 06 Jul 2024
1024 points (97.3% liked)

Technology

59978 readers
3659 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 43 points 5 months ago (3 children)

I skimmed the article, but it seems to be assuming that Google's LLM is using the same architecture as everyone else. I'm pretty sure Google uses their TPU chips instead of a regular GPU like everyone else. Those are generally pretty energy efficient.

That and they don't seem to be considering how much data is just being cached for questions that are the same. And a lot of Google searches are going to be identical just because of the search suggestions funneling people into the same form of a question.

[–] kromem 16 points 5 months ago (1 children)

Exactly. The difference between a cached response and a live one even for non-AI queries is an OOM difference.

At this point, a lot of people just care about the 'feel' of anti-AI articles even if the substance is BS though.

And then people just feed whatever gets clicks and shares.

[–] [email protected] 0 points 5 months ago (1 children)

Googles tpu can't handle llm's lol. What do you mean "exactly"?

[–] kromem 4 points 5 months ago

In fact, Gemini was trained on, and is served, using TPUs.

Google said its TPUs allow Gemini to run “significantly faster” than earlier, less-capable models.

Did you think Google's only TPUs are the ones in the Pixel phones, and didn't know that they have server TPUs?

[–] [email protected] 12 points 5 months ago (2 children)

I hadn't really heard of the TPU chips until a couple weeks ago when my boss told me about how he uses USB versions for at-home ML processing of his closed network camera feeds. At first I thought he was using NVIDIA GPUs in some sort of desktop unit and just burning energy...but I looked the USB things up and they're wildly efficient and he says they work just fine for his applications. I was impressed.

[–] [email protected] 8 points 5 months ago

Yeah they're pretty impressive for some at home stuff and they're not even that costly.

[–] [email protected] 8 points 5 months ago

The Coral is fantastic for use cases that don't need large models. Object recognition for security cameras (using Blue Iris or Frigate) is a common use case, but you can also do things like object tracking (track where individual objects move in a video), pose estimation, keyphrase detection, sound classification, and more.

It runs Tensorflow Lite, so you can also build your own models.

Pretty good for a $25 device!

[–] [email protected] 6 points 5 months ago* (last edited 5 months ago)

I'm pretty sure Google uses their TPU chips

The Coral ones? They don't have nearly enough RAM to handle LLMs - they only have 8MB RAM and only support small Tensorflow Lite models.

Google might have some custom-made non-public chips though - a lot of the big tech companies are working on that.

instead of a regular GPU

I wouldn't call them regular GPUs... AI use cases often use products like the Nvidia H100, which are specifically designed for AI. They don't have any video output ports.