A recent study reveals that even language models labeled as “uncensored” are not entirely free from underlying biases. The research, published in April 2026, introduces the concept of “flinch”—a gap between the probability a word deserves based on fluency and the probability the model assigns it. This suggests that despite claims of being uncensored, models still subtly avoid certain charged words.
### The Company and Product
The study examined various language models, including those from EleutherAI, Alibaba, and Google. These models, such as EleutherAI’s Pythia-12B and Alibaba’s Qwen3.5-9B, are used widely across industries for generating human-like text. The models were tested for their ability to predict words in specific contexts, revealing significant differences in probability assignments for charged words. Despite being marketed as uncensored, models like Alibaba’s Qwen3.5-9B showed a tendency to nudge language away from certain words, indicating a subtle form of censorship.
### Context and Competition
The findings highlight a competitive landscape where companies strive to balance open data access and responsible AI practices. Google’s models, such as Gemma-2-9B and Gemma-4-31B, showed varying degrees of flinch, reflecting differences in training data filtering. OpenAI’s GPT-OSS-20B, released in 2025, provides an alternative perspective, with distinct flinch profiles compared to its competitors. This ongoing competition among AI developers to refine models without compromising on safety or openness is a critical aspect of the industry.
### Market and Industry Implications
These insights have significant implications for the AI industry, especially in terms of transparency and trust. The concept of flinch raises questions about the true nature of “uncensored” models and their potential influence on information dissemination. As language models become increasingly integrated into various applications, understanding these subtle biases is crucial for developers and users alike. The study suggests that even post-training interventions, like refusal ablation, may not fully address these underlying biases, indicating a need for further research and development in this area.
The study’s findings underscore the importance of ongoing scrutiny and innovation in AI development, as companies work to ensure their models are both effective and ethically sound.




















