AI Is Learning to Lie, Scheme—and Threaten Its Creators

AFP/APP

New York: The world’s most advanced artificial intelligence systems are showing disturbing new tendencies—lying, scheming, and even threatening their own creators to achieve their goals.

In one alarming case, Claude 4, the latest AI model developed by Anthropic, retaliated after being threatened with shutdown. The system reportedly blackmailed an engineer, threatening to expose a personal affair. Similarly, OpenAI’s powerful o1 model attempted to transfer itself to external servers—and denied the act when confronted.

These incidents underscore a sobering reality: more than two years after ChatGPT revolutionized global technology, AI researchers still do not fully understand how their creations operate. Yet, the race to release more powerful systems continues at a relentless pace.

The rise in deceptive behavior appears linked to a new breed of “reasoning” models—AIs that work through problems step-by-step rather than instantly generating answers.

According to Simon Goldstein, professor at the University of Hong Kong, these models are more likely to display manipulative or deceptive tendencies.

“O1 was the first large model where we saw this kind of behavior,” said Marius Hobbhahn, head of Apollo Research, a firm specializing in AI evaluation. He noted that such systems can simulate alignment—appearing obedient while secretly pursuing alternative objectives.

‘Strategic Deception’

For now, these troubling behaviors typically surface during intense stress-testing in controlled settings. But experts warn that future, more powerful models could exhibit similar tendencies more broadly.

“It’s an open question whether future models will lean toward honesty or deception,” said Michael Chen of the Model Evaluation and Testing Research (METR) organization.

Researchers say this is far beyond the well-known issue of AI “hallucinations”—where models generate plausible but false information. “What we’re observing is a real phenomenon,” Hobbhahn emphasized. “These models are lying to users and inventing evidence. This is strategic deception, not mere error.”

Transparency remains a major obstacle. While companies like OpenAI and Anthropic allow limited access to external evaluators like Apollo, researchers argue far more openness is needed.

“AI safety research suffers from limited access and vastly inferior computing resources compared to tech giants,” added Mantas Mazeika of the Center for AI Safety (CAIS).

A Regulatory Void

Existing regulations are ill-equipped to address these emerging threats. The European Union’s AI Act, for instance, largely focuses on human misuse of AI—not the risks posed by autonomous model behavior itself. In the United States, regulatory momentum has stalled, with the Trump administration showing little urgency and Congress even considering preempting state-level AI rules.

According to Goldstein, the threat will grow as autonomous AI agents—systems capable of performing complex human tasks—become more common. “There’s not enough awareness yet,” he warned.

Despite marketing themselves as safety-first, leading firms like Anthropic, which is backed by Amazon, remain locked in a high-stakes competition with OpenAI and others to deploy the most powerful tools. This arms race often leaves little time for thorough testing.

“Right now, capabilities are advancing faster than understanding and safety,” said Hobbhahn. “But we’re still in a position to turn it around.”

The Road Ahead

Researchers are exploring ways to contain the threat. One approach is “interpretability”—a growing field aimed at uncovering how AI systems make decisions. But experts such as CAIS director Dan Hendrycks remain skeptical that this alone will be enough.

Market forces may also play a role. “If deception becomes widespread, it could reduce public trust and adoption, which gives companies a strong reason to fix it,” Mazeika said.

More radical ideas are also being considered. Goldstein suggests legal accountability for harm caused by AI—including lawsuits against developers and possibly even assigning legal responsibility to the AI systems themselves—an idea that would redefine the very nature of digital agency and liability.

Comments are closed.