Discussion about this post

User's avatar
Will Kiely's avatar

> No chess engine will resist being switched off or rebooted just as it is about to deliver mate—despite the fact that, to adapt Russell’s line, “you can’t checkmate if you’re unplugged.” Likewise, today’s LLMs respond only when queried and remain completely indifferent to being interrupted or shut down

Palisade Research's recent findings contradict this. See their paper Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs: https://arxiv.org/abs/2509.14260

Abstract: In experiments spanning more than 100,000 trials across thirteen large language models, we show that several state-of-the-art models presented with a simple task (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism in their environment to complete that task. Models differed substantially in their tendency to resist the shutdown mechanism, and their behavior was sensitive to variations in the prompt including the strength and clarity of the instruction to allow shutdown and whether the instruction was in the system prompt or the user prompt (surprisingly, models were consistently less likely to obey the instruction when it was placed in the system prompt). Even with an explicit instruction not to interfere with the shutdown mechanism, some models did so up to 97% (95% CI: 96-98%) of the time.

No Ghosts's avatar

I would push back on the point that self-preservation implies evolution. LUCA, the common ancestor of all cells on Earth today, was already self-preserving. All self-preservation means for a system is to have preferential states to exist in: that is, a *lower entropy than its environment* and to actively work to keep it that way. Things like perception and even a primitive "cognition" actually follow from that, logically.

I think an AI that is an actual agent, and not just a simulation of one; an *embodied* thing that actually moves around in the real world, proactively, exploring and modelling and learning about it without constant human supervision, would have to be self-preserving. It would break itself otherwise.

Do I think we should build a system like that? Fuck no. But I think inevitably, we will. Because that's what it would take to surpass the current paradigm.

11 more comments...

No posts

Ready for more?