this post was submitted on 27 Jan 2025
245 points (97.7% liked)

Interesting Global News

2613 readers
699 users here now

What is global news?

Something that happened or was uncovered recently anywhere in the world. It doesn't have to have global implications. Just has to be informative in some way.


Post guidelines

Title formatPost title should mirror the news source title.
URL formatPost URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
[Opinion] prefixOpinion (op-ed) articles must use [Opinion] prefix before the title.


Rules

1. English onlyTitle and associated content has to be in English.
2. No social media postsAvoid all social media posts. Try searching for a source that has a written article or transcription on the subject.
3. Respectful communicationAll communication has to be respectful of differing opinions, viewpoints, and experiences.
4. InclusivityEveryone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacksAny kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangentsStay on topic. Keep it relevant.
7. Instance rules may applyIf something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.


Companion communities

Icon attribution | Banner attribution

founded 2 years ago
MODERATORS
 

US chip-maker Nvidia led a rout in tech stocks Monday after the emergence of a low-cost Chinese generative AI model that could threaten US dominance in the fast-growing industry.

The chatbot developed by DeepSeek, a startup based in the eastern Chinese city of Hangzhou, has apparently shown the ability to match the capacity of US AI pace-setters for a fraction of the investments made by American companies.

Shares in Nvidia, whose semiconductors power the AI industry, fell more than 15 percent in midday deals on Wall Street, erasing more than $500 billion of its market value.

The tech-rich Nasdaq index fell more than three percent.

AI players Microsoft and Google parent Alphabet were firmly in the red while Meta bucked the trend to trade in the green.

DeepSeek, whose chatbot became the top-rated free application on Apple's US App Store, said it spent only $5.6 million developing its model -- peanuts when compared with the billions US tech giants have poured into AI.

US "tech dominance is being challenged by China," said Kathleen Brooks, research director at trading platform XTB.

"The focus is now on whether China can do it better, quicker and more cost effectively than the US, and if they could win the AI race," she said.

US venture capitalist Marc Andreessen has described DeepSeek's emergence as a "Sputnik moment" -- when the Soviet Union shocked Washington with its 1957 launch of a satellite into orbit.

As DeepSeek rattled markets, the startup on Monday said it was limiting the registration of new users due to "large-scale malicious attacks" on its services.

Meta and Microsoft are among the tech giants scheduled to report earnings later this week, offering opportunity for comment on the emergence of the Chinese company.

Shares in another US chip-maker, Broadcom, fell 16 percent while Dutch firm ASML, which makes the machines used to build semiconductors, saw its stock tumble 6.7 percent.

"Investors have been forced to reconsider the outlook for capital expenditure and valuations given the threat of discount Chinese AI models," David Morrison, senior market analyst at Trade Nation.

"These appear to be as good, if not better, than US versions."

Wall Street's broad-based S&P 500 index shed 1.7 percent while the Dow was flat at midday.

In Europe, the Frankfurt and Paris stock exchanges closed in the red while London finish flat.

Asian stock markets mostly slid.

Just last week following his inauguration, Trump announced a $500 billion venture to build infrastructure for AI in the United States led by Japanese giant SoftBank and ChatGPT-maker OpenAI.

SoftBank tumbled more than eight percent in Tokyo on Monday while Japanese semiconductor firm Advantest was also down more than eight percent and Tokyo Electron off almost five percent.

top 50 comments
sorted by: hot top controversial new old
[–] Treczoks 15 points 2 days ago

There is no problem with deflating the bubble. There still is a lot of hot air inside to lose.

[–] [email protected] 73 points 2 days ago (1 children)

fell more than 15 percent in midday deals on Wall Street, erasing more than $500 billion of its market value

If I'm reading this correctly, that would mean that NV was previously valued at ~$3.4T??

Yeah, they might be a bit overvalued. Just hint.

[–] Frozengyro 34 points 2 days ago

Come on, one company being worth 3% of the whole market is completely normal...

[–] [email protected] 79 points 2 days ago

It only cost $5 million to blow out $500 billion from the stock market.

All hail open source.

[–] UnderpantsWeevil 37 points 2 days ago (1 children)

The number of people repeating "I bet it won't tell you about Tianamen Square" jokes around this news has - imho - neatly explained why the US tech sector is absolutely fucked going into the next generation.

[–] Womble 36 points 2 days ago (22 children)
[–] [email protected] 14 points 2 days ago (1 children)

It's even worse / funnier in the app, it will generate the response, then once it realizes its about Taiwan it will delete the whole response and say sorry I can't do that.

If you ask it "what is the republic of china" it will generate a couple paragraphs of the history of China, then it'll get a couple sentences in about the retreat to Taiwan and then stop and delete the response.

[–] Womble 14 points 2 days ago* (last edited 2 days ago) (1 children)

In fairness that is also exactly what chatgpt, claude and the rest do for their online versions too when you hit their limits (usually around sex). IIRC they work by having a second LLM monitor the output and send a cancel signal if they think its gone over the line.

[–] [email protected] 6 points 2 days ago (1 children)

Okay but one is about puritanical Western cultural standards about sex, and one is about government censorship to maintain totalitarian power. One of these things is not like the other.

[–] Womble 3 points 1 day ago

Yes I'm aware, I was saying that the method is the same.

[–] Smokeydope 6 points 2 days ago* (last edited 2 days ago) (2 children)

Try an abliterated version of the qwen 14b or 32b R1 distills. I just tried it out they will give you a real overview.

Still even when abliterated its just not very knowledgeable about "harmful information". If you want a truly uncensored model hit up mistral small 22b and its even more uncensored fine tune Beepo 22b

load more comments (2 replies)
[–] [email protected] 5 points 2 days ago (2 children)

You missed the entire point of their comment

[–] Womble 5 points 2 days ago (4 children)

Maybe they should have been clearer than saying people were joking about it doing something that it actually does if they wanted to make a point.

load more comments (4 replies)
load more comments (1 replies)
load more comments (19 replies)
[–] timewarp 32 points 2 days ago* (last edited 2 days ago)

This is almost too perfect but they'll learn nothing. Rather than make concessions to get rid of the ridiculous capitalistic & oligarchic system & promote rapid innovation through community & open source, they'll just demand that government give them even more trillions in tax payer money so they can compete with an open source model.

[–] [email protected] 23 points 2 days ago (6 children)

Reposting from a similar post, but... I went over to huggingface and took a look at this.

Deepseek is huge. Like Llama 3.3 huge. I haven't done any benchmarking, which I'm guessing is out there, but it surely would take as much Nvidia muscle to run this at scale as ChatGPT, even if it was much, much cheaper to train, right?

So is the rout based on the idea that the need for training hardware is much smaller than suspected even if the operation cost is the same... or is the stock market just clueless and dumb and they're all running on vibes at all times anyway?

[–] [email protected] 19 points 2 days ago

So is the rout based on the idea that the need for training hardware is much smaller than suspected even if the operation cost is the same... or is the stock market just clueless and dumb and they're all running on vibes at all times anyway?

Two parts here.

  1. nVidia is over valued, everyone has known this but nobody wanted to call. Someone clicking a decent model on a fraction the resources was good as anyone to call the bluff.
  2. Lots of the folks who are in it for nVidia believe that companies are going to need chips out the ass to keep up. It's getting ahead of everyone to say "that's no longer true", but for reasons there's a good chance the chip expectation isn't as big as nVidia was painting.

As for the model.

This model is from China and trained there. They have an embargo on the best chips, they can't get them. So they aren't supposed to have the resources to produce what we're seeing with DeepSeek, and yet, here we are. So either someone has slipped them a shipment that's a big no-no OR we take it at face value here that they've found a way to optimize training.

The neat thing about science is reproducibility. So given the paper DeepSeek wrote and the open source nature of this. Someone should be able to sit down and reproduce this in about two month (ish). If they can, nVidia is going to have a completely terrible time and the US is going to have to rethink the whole AI embargo.

Without deep diving into this model and what it spouts, the skinny is that nVidia has their top tier AI GPUs. It has all these parts cut into the silicon that makes creating a model cost a lot less in kilowatts of power. DeepSeek says they were able to put in some optimizations that gets you a model on low kilowatts by optimizing some of the parts found only in the top tier AI GPUs.

Blah blah example of this DeepSeek used 32 of the 132 streaming multiprocessors on their Hopper GPUs to act as a hardware accelerated communication manager and scheduler. Top tier nVidia cards for big farms do this in their hardware already in a circuit called the DPU. Basically DeepSeek found a way to use their Hopper GPUs to do the same function as nVidia's DPUs.

If true, it means that the hardware nVidia is popping into their top tier isn't strictly required. It's nice, and you'll still get a model on less kilowatts than the tricks DeepSeek is using, but DeepSeek's tricks means the price difference between top tier and low tier needs to be a lot closer than it is to stay competitive. As it stands with DeepSeek's tricks (again, if they prove to be correct) is that if you've got a little extra time, you can get bottom tier AI GPUs and spend about the same kilowatts for what the top tier will kick out with a hint less kilowatts. The difference in cost of kilowatts between the amount you spend on low tier and amount you spend on kilowatts on top tier isn't enough to justify the top tier's price difference from the low tier, if time is not a factor.

And so that brings us full circle here. If someone is able to reproduce DeepSeek's gains, nVidia's top tier GPUs are way over priced and their bottom tier is going to sell out like hotcakes. That's bad for nVidia if they were hoping to, IDK, make ridiculous profit. And that is why the sudden spook in the market. I mean, don't get me wrong, folks have been looking forward to popping nVidia's bubble, so they've absolutely been hyping this whole thing up a lot more. And it didn't help that it came top #1 on the Apple App Store.

So some of this is those people riding the hate nVidia train. But some of it is also, well this is interesting if true. I think it's a little early to start victory laps around nVidia's grave. The optimizations purposed by DeepSeek have yet to be verified as accurate. And things are absolutely going to get interesting no matter the outcome. Because if the purposed optimizations don't actually produce the kind of model DeepSeek has, where did they get it from? How did they cheat? Because then that's an interesting question in of itself, because they aren't supposed to have hardware that would allow them to make this. Which could mean a few top tier cards are leaking into China's hands.

But if it all does prove true, well, he he he, nVidia shorts are going to be eating mighty well.

[–] [email protected] 14 points 2 days ago (1 children)

I believe it would have lowerr operational costs assuming the models the only thing different and they target the same size. Deepseek does the "mixture of experts" approach which makes it use a subset of parameters thus making it faster / less computational.

That's said I have a basic understanding of AI so maybe my understanding is flawed.

[–] [email protected] 8 points 2 days ago (7 children)

But the models that are posted right now don't seem any smaller. The full precision model is positively humongous.

They found a way to train it faster. Fine. So they need fewer GPUs and can do it on slower ones that are much, much cheaper. I can see how Nvidia takes a hit on the training side.

But presumably the H100 is still faster than the H800s they used for this and presumably running the resulting model is still just as hard. All the improvements seem like they're on the training side.

Granted, I don't understand what they did and will have to go fishing for experts walking through it in more detail. I still haven't been able to run it myself, etiher, maybe it's still large but runs lighter on processing and that's noticeable. I just haven't seen any measurements of that side of things yet. All the coverage is about how cheap the training was on H800s.

[–] [email protected] 10 points 2 days ago (2 children)

They aren't necessarily smaller from my understanding. Say it has 600B parameters, it more efficiently uses them. You ask it a programming question, it pulls 37B parameters most related to it and responds using those instead of processing all 600B.

Think of it like a model with specialized submodels that a specific one may provide the best answer, and uses it.

load more comments (2 replies)
load more comments (6 replies)
[–] jrs100000 8 points 2 days ago (1 children)

It comes in different versions, some of which are enormous and some that are pretty modest. The thing is they are not competing with 4o, but with o1, which has really massive resource requirements. The big news this week was supposed to be o3 opening up to the public, which involved another huge resource jump, and set a clear trajectory for the future of AI being on an exponential curve of computing power. This was great news for companies that made the parts and who could afford the massive buildout to support future developments. Deepseek isnt so much disruptive for its own capabilities, its disruptive because challenges the necessity of this massive buildout.

load more comments (1 replies)
[–] [email protected] 7 points 2 days ago (1 children)

Clueless dumb vibes, yeah. But exaggerated by the media for clicks, too - Nvidia price is currently the same as it was in Sept 2024. Not really a huge deal.

Anyway, the more efficient it is the more usage there will be and in the longer run more GPUs will be needed - https://www.greenchoices.org/news/blog-posts/the-jevons-paradox-when-efficiency-leads-to-increased-consumption

load more comments (1 replies)
load more comments (2 replies)
[–] [email protected] 15 points 2 days ago (1 children)

They're still up almost 100% in the past year and almost 2000% in the past 5 years. The stock price will be fine.

[–] UnderpantsWeevil 8 points 2 days ago* (last edited 2 days ago)

Check the P/E on a lot of these firms. So much of the valuation is predicated on enormous rates of future growth. Their revenue isn't keeping up with their valuation. A big chunk of that 2000% is just people trading on the Greater Fool who will buy the shares later.

Microsoft will be fine, sure. Meta will be fine, sure. The guy leveraged to the tits to go long on ARK Innovation ETF? Far more dubious.

[–] [email protected] 7 points 2 days ago (1 children)

What was made more efficiently? The chip? The energy needs of the AI model?

[–] [email protected] 13 points 2 days ago (2 children)

Software. I know surface level about the current AI environment and I have friends saying buy to Nvidia but I was wondering when there would be improvements to the software.

Example, we used to need a pretty top notch PC to play Doom but now we can emulate the hardware and run it on a pregnancy test.

load more comments (2 replies)
[–] [email protected] 2 points 2 days ago

I've been watching the capex boys react to DeepSeek and laughing hysterically tbh

[–] lemmus 6 points 2 days ago

Bubble goes pop.

[–] [email protected] 6 points 2 days ago* (last edited 2 days ago) (3 children)

The censorship goes way beyond obvious stuff like taiwan.

Ask DeepSeek "What is a tankie?" and see what happens.

[–] [email protected] 7 points 2 days ago (3 children)

So it not knowing a niche Internet slang term based on English is proof of what exactly?

It's open source. I'm sure there's already a fork patching in the big omissions.

[–] [email protected] 6 points 2 days ago

It's definitely censorship, you can see it on there app as it's still buggy and will generate a response then halfway through it will delete it and say "sorry that's beyond my current scope"

It did actually give a good multi paragraph response to "what is a tankie" before it realized that it was a no-no topic.

[–] [email protected] 8 points 2 days ago

@rimu @Bronzebeard On the one hand, when Deep Seek "doesn't know" about a thing (i.e., something not present the training data), it'll state it clearly (I'm not sure if the image will be sent as I'm not using Lemmy directly to reply this):

The context of the image is the following: I asked DeepSeek about "Abnukta", an obscure and not-so-much-known Enochian term that is used during one of the invocations of Lilith, and DeepSeek replied the following:

"Abnukta is a term that does not have a widely recognized or established meaning in mainstream English dictionaries or common usage. It could potentially be a misspelling, a neologism, or a term from a specific dialect, jargon, or cultural context. If you have more context or details about where you encountered the term, I might be able to provide a more accurate explanation. Alternatively, it could be a name or a term from a specific field or community that is not widely known".

So, the answer that the user Rimu received is not regarding something "unknown" to the LLM (otherwise it'd be clearly stated as that, as per my example), but something that triggered moderation mechanisms. So, in a sense, yes, the LLM refused to answer...

However... On the other hand, western LLMs are full of "safeguards" (shouldn't we call these as censorship, too?) regarding certain themes, so it's not an exclusivity of Chinese LLMs. For example:
- I can't talk about demonolatry (the worshiping of daemonic entities, as present in my own personal beliefs) with Claude, it'll ask me to choose another subject.
- I can't talk with Bing Copilot about some of my own goth drawings.
- Specifically regarding socio-economics-politics subjects, people can't talk with ChatGPT and Google Gemini about a certain person involved in a recent US event, whose name is the same as a video-game character known for wearing a green hat and being the brother of another character that enters pipes and seeks to set free a princess.
- GitHub Copilot refuses (in a blatant Scumthorpe Problem) to reply or suggest completion for code containing terms such as "trans" or "gender" (it's an open and known issue on GitHub, so far unanswered why or how to make Copilot answer).

But yeah, west is the land of the freedom /s

[–] shalafi 8 points 2 days ago* (last edited 2 days ago)

Word definitions are exactly the sorts of things one would expect an LLM to pick up on.

ChatGPT:

A "tankie" is a term often used to describe a person who is an ardent supporter of authoritarian regimes, particularly those that claim to be socialist or communist. The term originally referred to members of communist parties who supported the Soviet Union's use of tanks to suppress uprisings, like the 1956 Hungarian Revolution and the 1968 Prague Spring. Over time, it's been used more broadly to refer to people who justify or defend the actions of regimes like the Soviet Union, China under Mao, or North Korea, often in the face of human rights abuses or authoritarian policies.

The label can sometimes be used pejoratively to imply that someone is uncritical of authoritarianism or blindly supportive of these regimes due to ideological alignment, even when those regimes engage in actions that contradict the values they claim to uphold.

load more comments (2 replies)
load more comments
view more: next ›