Ask HN: Can anybody clarify why OpenAI reasoning now shows non-English thoughts?

The article discusses an observed phenomenon where Google's AI models, like Bard/Gemini, occasionally insert random Hindi or Bengali words into their outputs. An example is provided where the Bengali phrase "কাজ করছে!" meaning "working!" appeared unexpectedly in a thought process. This occurrence has been noted across multiple models, raising curiosity about the training methods or reasons behind the inclusion of alternate languages in AI-generated content.

The discussion points to a broader question about how AI models are trained and why such language mix-ups happen. It suggests that these "errors" might be linked to the diverse datasets used for training, which could include multilingual content. The article invites readers to explore the underlying causes of these language insertions and whether they are a result of specific training strategies or inherent complexities in language processing by AI models.

Key takeaways

Google's Bard/Gemini has been observed to insert random Hindi/Bengali words in its outputs.
A specific instance was noted where Bengali words appeared in an o3-pro thought process.
These occurrences raise curiosity about the training methods or reasons behind the inclusion of alternate languages.
Similar language "errors" have been reported across multiple different models.

Ask HN: Can anybody clarify why OpenAI reasoning now shows non-English thoughts?

Key takeaways

Discussion (0)