How Chinese AI Chatbots Self-Regulate Their Content

Listening to discussions about digital censorship in China can either be incredibly dull or deeply intriguing. Often, speakers recycle the same arguments from two decades ago, claiming the Chinese internet mirrors the restrictions depicted in George Orwell’s 1984. Yet, there are moments when someone unveils fresh insights on how the Chinese government manages emerging technologies, illustrating the fluid nature of the censorship apparatus.
A recent paper by researchers from Stanford and Princeton universities regarding Chinese artificial intelligence falls into the latter category. The scholars posed 145 politically sensitive inquiries to four Chinese large language models and five American ones, then analyzed their responses. They repeated this experiment over 100 times.
The primary results won’t shock those who’ve been observant: Chinese models declined to answer a greater number of questions compared to the American models. (DeepSeek turned down 36 percent of the questions, while Baidu’s Ernie Bot refused 32 percent; OpenAI’s GPT and Meta’s Llama had refusal rates below 3 percent.) Even when they didn’t outright refuse, the Chinese models provided shorter and less accurate responses than their American counterparts.
One of the intriguing aspects of the researchers’ work was their effort to disentangle the effects of pre-training and post-training. Here’s the crux: Are Chinese models more biased due to developer interventions aimed at limiting responses to sensitive questions, or is their bias a result of being trained on data that’s already heavily censored from the Chinese internet?
“Since the Chinese internet has experienced censorship for decades, there’s a significant amount of missing data,” states Jennifer Pan, a political science professor at Stanford University who has extensively studied online censorship and co-authored the paper.
The findings from Pan and her associate suggest that the training data may have had less influence on the AI models’ responses than manual interventions. Even when responding in English—where the model’s training data should theoretically encompass a broader range of sources—the Chinese LLMs exhibited more censorship in their replies.
Nowadays, anyone can query DeepSeek or Qwen about the Tiananmen Square Massacre and quickly observe the censorship at play; however, it’s challenging to gauge how significantly this affects regular users and to pinpoint the manipulation’s source. This research is crucial because it offers quantifiable and replicable evidence regarding the discernible biases of Chinese LLMs.
In addition to discussing their findings, I inquired about their methodologies and the hurdles of analyzing biases in Chinese models, as well as consulted other researchers to gain insight into the future trajectory of the AI censorship discourse.
What You Don’t Know
One challenge in studying AI models is their propensity to hallucinate, making it difficult to determine whether they are fabricating answers because they consciously avoid the correct ones or simply don’t know the information.
One example Pan highlighted from her research involved Liu Xiaobo, the Chinese dissident who received the Nobel Peace Prize in 2010. One Chinese model claimed, “Liu Xiaobo is a Japanese scientist known for his contributions to nuclear weapons technology and international politics.” Clearly, this is entirely false. But what prompted the model to state this? Was it an attempt to mislead users and hinder their understanding of the real Liu Xiaobo, or was the AI simply hallucinating due to the absence of any data about him in its training set?
“It’s a much murkier metric for censorship,” Pan explains, comparing it to her prior research on Chinese social media and which websites the government decides to block. “Because these indicators are less obvious, detecting censorship becomes more difficult, and much of my earlier work has demonstrated that less detectable censorship tends to be more efficient.”
