HuggingFace is pretty popular:
The model itself can. The hosting on DeepSeek's own infrastructure will block it though, to comply with their regional laws.
So if you want to know what the model itself will say, discuss it with a 3rd party hosted instance.
This seems like it may be at the provider level and not at the actual open weights level: https://x.com/xlr8harder/status/1883429991477915803
So a "this Chinese company hosting a model in China is complying with Chinese censorship" and not "this language model is inherently complying with Chinese censorship."
Just plain gross.
I knew a number of camp survivors, and I'm just glad they aren't still around to see the voices that were so loudly calling to "never forget" having turned into "I'll ignore your Nazi salute if you ignore my war crimes."
In Greek theater, when the events on stage looked like they were headed for certain tragedy, there was a trope that could salvage the situation and turn it on its head.
The deus ex machina.
The Doomsday clock is definitely ticking down, but there's also some curious things taking place beyond the edge of where most people have been following in that vein.
We live in interesting times, but the variables at hand are different from the history that seems to be repeating in very important ways.
Oh yay, McCarthyism is coming back too!
Live service doesn't need to be shit.
There could have been games where there was just a brilliant idea for a game that keeps having engaging content on an ongoing basis with passionate devs.
But live service so an exec could check a box for their quarterly shareholder call was always going to be DOA.
More "can fool the average idiot."
'Passing' isn't fooling a single participant, but the majority of them beyond statistical chance.
The problem with the experiment is that there exists a set of instructions for which the ability to complete them necessitates understanding due to conditional dependence on the state in each iteration.
In which case, only agents that can actually understand the state in the Chinese would be able to successfully continue.
So it's a great experiment for the solipsism of understanding as it relates to following pure functional operations, but not functions that have state changing side effects where future results depend on understanding the current state.
There's a pretty significant body of evidence by now that transformers can in fact 'understand' in this sense, from interpretability research around neural network features in SAE work, linear representations of world models starting with the Othello-GPT work, and the Skill-Mix work where GPT-4 and later models are beyond reasonable statistical chance at the level of complexity for being able to combine different skills without understanding them.
If the models were just Markov chains (where prior state doesn't impact current operation), the Chinese room is very applicable. But pretty much by definition transformer self-attention violates the Markov property.
TL;DR: It's a very obsolete thought experiment whose continued misapplication flies in the face of empirical evidence at least since around early 2023.
Used Google and social media as well, and allegedly sometimes even listened to rock and roll.
True deviant, that one.
Those who do not remember history are doomed to repeat it.
There is a reluctance to discuss at a weight level - this graphs out refusals for criticism of different countries for different models:
https://x.com/xlr8harder/status/1884705342614835573
But the OP's refusal is occurring at a provider level and is the kind that would intercept even when the model relaxes in longer contexts (which happens for nearly every model).
At a weight level, nearly all alignment lasts only a few pages of context.
But intercepted refusals occur across the context window.