this post was submitted on 08 Nov 2023
128 points (97.8% liked)

Technology

59697 readers
5159 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 5 points 1 year ago

This is the best summary I could come up with:


Demand for Microsoft's AI services is apparently so great – or Redmond's resources so tight – that the software giant plans to offload some of the machine-learning models used by Bing Search to Oracle's GPU supercluster as part of a multi-year agreement announced Tuesday.

The partnership essentially boils down to: Microsoft needs more compute resources to keep up with the alleged "explosive growth" of its AI services, and Oracle just happens to have tens of thousands of Nvidia A100s and H100 GPUs available for rent.

Microsoft was among the first to integrate a generative AI chatbot into its search engine with the launch of Bing Chat back in February.

You all know the drill by now: you can feed prompts, requests, or queries into Bing Chat, and it will try to look up information, write bad poetry, generate pictures and other content, and so on.

In this case, Microsoft is using the system alongside its Azure Kubernetes Service to orchestrate Oracle's GPU nodes to keep up with what's said to be demand for Bing's AI features.

Oracle claims its cloud super-clusters, which presumably Bing will use, can each scale to 32,768 Nvidia A100s or 16,384 H100 GPUs using a ultra-low latency Remote Direct Memory Access (RDMA) network.


The original article contains 580 words, the summary contains 207 words. Saved 64%. I'm a bot and I'm open source!