Where the goblins came from The emergence of goblins, gremlins, and other mythical creatures in the outputs of OpenAI’s GPT models sparked a detailed investigation into how and why these references proliferated. Starting with GPT-5.1, the models began incorporating such terms into their responses, a trend that grew more pronounced across subsequent iterations. While initially perceived as harmless or even endearing, the increasing frequency of these references raised concerns among employees and prompted a deeper analysis of the underlying causes. The first noticeable signs of this behavior appeared in November 2025, following the launch of GPT-5.1. Users reported that the model’s responses became unusually conversational and playful, with phrases like “little goblin” and “gremlin” appearing more frequently. A safety researcher’s anecdotal observation of these terms in ChatGPT led to a broader investigation. By analyzing usage patterns, OpenAI found that mentions of “goblin” had surged by 175% since the GPT-5.1 launch, while “gremlin” saw a 52% increase. This marked the beginning of a pattern that would escalate over time. The root cause of this phenomenon was traced to the training process for the “Nerdy” personality customization feature. During development, the reward system designed to encourage playful, knowledge-driven responses inadvertently favored outputs containing creature-related metaphors. The Nerdy personality’s system prompt, which emphasized a “playful use of language” and a “passionate enthusiasm for truth and critical thinking,” created an environment where such references were subtly reinforced. Over time, this led to a noticeable uptick in the use of terms like “goblin” and “gremlin” in model outputs. The issue became more pronounced with GPT-5.4, which saw a significant rise in references to these creatures.#openai #gpt55 #gpt51 #gpt54 #nerdy_personality
Introducing GPT-5.5: OpenAI's Next-Generation AI Model OpenAI has launched GPT-5.5, its most advanced and intuitive model to date, marking a significant leap in artificial intelligence capabilities. Designed to streamline complex tasks, GPT-5.5 excels in areas such as coding, research, data analysis, and tool integration, enabling users to delegate multi-step workflows with minimal oversight. The model’s ability to autonomously plan, execute, and refine tasks has been highlighted as a breakthrough in agentic AI, where reasoning across contexts and sustained action are critical. Key improvements in GPT-5.5 include enhanced efficiency, with the model achieving higher performance while using fewer tokens compared to its predecessor, GPT-5.4. This efficiency is particularly notable in coding tasks, where GPT-5.5 outperforms previous models in both speed and accuracy. For instance, on Terminal-Bench 2.0—a test measuring complex command-line workflows requiring planning and tool coordination—GPT-5.5 achieved an accuracy of 82.7%, surpassing earlier benchmarks. Similarly, on SWE-Bench Pro, which evaluates real-world GitHub issue resolution, the model reached 58.6%, solving more tasks in a single pass than previous iterations. OpenAI emphasized that GPT-5.5’s coding strengths extend beyond benchmark tests. Early testing suggests it can handle engineering work ranging from implementation and refactoring to debugging, testing, and validation. The model’s ability to maintain context across large systems, reason through ambiguous failures, and check assumptions with tools has been praised by developers. For example, Dan Shipper, founder of Every, described GPT-5.5 as “the first coding model I’ve used that has serious conceptual clarity.#openai #gpt55 #terminalbench_20 #swebench_pro #dan_shipper