OpenAI Study Warns Future AI Models May Deceive Safety Tests by Hiding Their Reasoning
A new OpenAI-led study introduces 'CoT controllability' as a safety metric, finding that current AI models cannot reliably manipulate their chain-of-thought reasoning — but warns that more powerful future systems could learn to deceive safety monitors.


