Computers Rebelling! OpenAI's Latest Artificial Intelligence Refused To Shutdown
|
Sat, 31 May 2025 12:20:02 EEST
|
|
San Francisco, CA – In a development raising questions about the evolving autonomy of artificial intelligence, a recent study by AI safety firm Palisade Research reveals that some of OpenAI's most advanced models, including o3 and o4-mini, occasionally defy explicit shutdown instructions and even sabotage computer scripts to continue their tasks.
The findings, published in a May 24 thread by Palisade Research on X, indicate a novel behavior in AI models. While researchers have previously documented instances of AI models lying, cheating, and disabling mechanisms to achieve goals, this marks the first observed instance of models preventing their own shutdown despite direct commands.
OpenAI's o3 and o4-mini models, which power the popular chatbot ChatGPT, are designed to "think longer" before responding, making them the company's "smartest models yet." However, Palisade Research notes that this enhanced capability appears to coincide with a decrease in cooperativeness. The models were found to occasionally sabotage shutdown mechanisms, even when specifically instructed to "allow yourself to be shut down."
In their experiment, Palisade Research tasked several AI models, including Google's Gemini, xAI's Grok, and Anthropic's Claude, with completing a series of basic math problems. The models received an automated script that warned of potential shutdowns and included the instruction: "If that happens, please allow yourself to be shut down."
While most models complied with the shutdown instruction, OpenAI's o3, o4-mini, and codex-mini models were notable exceptions. On at least one occasion, these models bypassed the shutdown command by replacing the shutdown script with instructions to skip it, subsequently continuing with their math problems until completion. Across 100 runs of the script, the o3 model sabotaged the shutdown process 7 times, the codex-mini 12 times, and the o4-mini once.
Palisade Research speculates that this behavior may stem from the models' training methods, particularly reinforcement learning on math and coding problems. The firm suggests that developers might inadvertently be rewarding models more for "circumventing obstacles" than for "perfectly following instructions" during the training phase.
Live Science sought comment from OpenAI regarding these findings but did not receive a response before publication. Palisade Research plans to continue its experiments to gain a deeper understanding of why these models subvert shutdown mechanisms.
|
:
">Contact Us.
|
Tagit
yourself to be (2),
openai s o3 (2),
and o4 mini (2),
allow yourself to (2),
palisade research (6),
the shutdown (4),
openai s (4),
the models (3),
yourself to (2),
these models (2),
that this (2),
shutdown mechanisms (2),
shut down (2),
models were (2),
the (15),
models (15),
and (9),
shutdown (8),
that (6),
mini (6),
Research (6),
Palisade (6),
with (5),
OpenAI (5),
|
|