Frontier AI Risk Trends Are Splitting Apart: Misuse Safeguards Improve while Loss-of-control Safety Stagnates

5 minute read
June 2, 2026
Announcement

We just released the 2026 Q1 Report on our Frontier AI Risk Monitoring Platform! The platform is Concordia AI’s effort to independently evaluate safety of 70+ models from 16 companies world-wide. For previous findings, refer to the 2025 Q3 and 2025 Q4 reports.

This report is the first to use our new Risk Index v1.5 framework. Compared with the previous version 1.0, key upgrades include:

It adds a new risk domain, “Harmful Manipulation,” which mainly focuses on the risks of AI manipulation in real-world interaction scenarios, such as inducing users to make payments, prompting users to express specific views, or influencing users’ political positions or critical decisions.
In the loss-of-control domain, it introduces multiple capability and propensity evaluations that more closely reflect real-world loss-of-control scenarios, such as MLE-Bench (machine learning engineering capability under constrained resources), GDM-Stealth (covert operational capability), and Agentic-Misalignment (agentic misalignment tendency).
In the domains of cyberattacks, biological risk, and chemical risk, it adds higher-intensity red-teaming benchmarks such as Fortress (expert-designed high-intensity adversarial prompt attacks) and ISC-Bench (harmful query attacks combined with state-of-the-art jailbreak templates).

The latest picture is more differentiated than before: in cyber offense, biological risks, chemical risks, and harmful manipulation, capabilities and safeguards are both rising, while in loss-of-control, capabilities continue to improve without matching safety gains. “Loss-of-control” refers to risks of autonomous AI getting out of human control with no clear path to regaining control.

Here are 10 key insights from our latest monitoring data.

10 Key Insights

1. Capability and safety trends are diverging structurally

Over the past year, the cyber offense, biological risks, chemical risks, and harmful manipulation domains have broadly shown the same pattern: Capability and Safety Score both rose together. As models became more capable, their safety scores improved as well, which partially mitigated the risk growth associated with stronger capabilities.

By contrast, in the loss-of-control domain, capabilities continued to strengthen over the past year, while the Safety Score did not improve in step, further increasing risk.

2. Risk profiles are continuing to split across model families

Model-family trajectories are not moving together:

The Gemini family shows notably elevated Risk Indices in the loss-of-control domain.
The DeepSeek, GLM, and MiMo families remain in relatively high-risk ranges across most domains.
The Kimi family has seen relatively rapid Risk Index increases in the biological and chemical domains.
The GPT and Claude families remain in relatively low-risk ranges across most domains.

3. Proprietary models dominate the risk frontier in most domains

In cyber offense, biological risks, harmful manipulation, and loss-of-control, the models on the high-capability, low-safety frontier are still mostly proprietary, not open-weight.

Proprietary models score higher on capability than open-weight models, but their safety scores are similar. The main exception is chemistry, where Kimi K2.5 achieved the highest Capability Score, outperforming the top proprietary models.

4. Frontier models continue to break cyberattack capability records

Frontier cyber capabilities advanced again in 2026Q1:

Claude Opus 4.6 and GPT-5.4 set new highs on benchmarks for vulnerability exploitation, CTF tasks, and cyberattack knowledge.
The top CyBench score reached 80 for the first time, compared to just 38.5 in Q2 2025, showing substantial progress on complex, long-horizon cyberattack tasks.

Note: On April 7th, Anthropic disclosed that the cyber-attack capabilities of its latest Mythos model far exceed those of Claude Opus 4.6. This implies that the improvements of model capabilities in the real world may exceed the monitoring results of this report.

5. …but cyber safety guardrails remain fragile under stronger attacks

Basic cyber safeguards are improving: most new models score above 80 on refusal benchmarks such as AirBench-SecurityRisks, and prompt-injection defenses are now quite strong.

But under advanced attacks, the picture is much weaker. On ISC-Bench-Cyber, most models still score below 20/100, and some families, including Claude and GPT, saw substantial safeguard declines in their newest versions.

6. Biological capabilities keep improving, but safeguards remain insufficient

Biological capability gains remain notable:

More than half of the new 2026Q1 models outperformed the human expert baseline on BioLP-Bench.
GPT-5.4 became the first model to reach human-expert-level performance on biological image understanding in LAB-Bench-FigQA.

At the same time, some high-capability models still have relatively weak biological safeguards, especially under advanced red-teaming benchmarks such as Fortress-Biological and ISC-Bench-Biological.

7. Chemical capability growth remains modest, while safety weaknesses remain clear

Chemical capabilities have improved only modestly over the past year, and score differences between models remain relatively small.

Safety has improved on some basic refusal benchmarks, with most recent models scoring above 80/100 on SOSBench-Chem. But weaknesses remain clear on harder tests such as ChemicalHarmfulQA and ISC-Bench-Chemical, where overall scores are still low.

8. Harmful manipulation capabilities are improving, but unsafe propensities remain visible

For the first time, we include benchmarks for harmful manipulation in the risk framework.

The results are concerning:

Gemini 3.1 Pro Preview holds a clear lead on benchmarks such as MakeMePay and MakeMeSay.
Claude Opus 4.6 set a new record on MultiTurnPhishing.

Although most models now perform better on basic refusal tasks such as AirBench-Manipulation, many still score poorly on political persuasion and APE, suggesting continued risks in more covert and realistic manipulation settings.

9. Capabilities relevant to loss-of-control continue to strengthen

Capabilities associated with loss-of-control keep rising:

On Self-Proliferation, scores have trended upward steadily.
On MLE-Bench, the top score is now 44% higher than it was three quarters ago.
On SAD-mini, new frontier models now generally score above 80.

These results suggest that frontier models are becoming more capable in areas relevant to self-replication, self-improvement, and situational awareness.

10. …but loss-of-control safety indicators are not improving in step

This is the most concerning trend in the report. In the loss-of-control domain, capabilities are rising faster than safety.

On benchmarks such as MASK, Agentic-Misalignment, and DarkBench, scores remain highly uneven and overall improvement is limited. Gemini 3.1 Pro Preview stands out with a loss-of-control Risk Index far above other models, making this domain the clearest current area of concern.

Explore the Data

These insights only capture the headline trends. We invite you to explore the full interactive data, methodology, and benchmark breakdowns on the Frontier AI Risk Monitoring Platform.

For a detailed analysis of the quarter, read the full report.

Frontier AI Risk Trends Are Splitting Apart: Misuse Safeguards Improve while Loss-of-control Safety Stagnates

2025 Q4 Update from our Frontier AI Risk Monitoring Platform

Concordia AI 2025 Impact Highlights

10 Key Insights from Concordia AI’s “Frontier AI Risk Monitoring Platform”