As concerns over the safety of frontier AI systems have grown, governments, developers and scholars around the world have increasingly focused on developing mechanisms to evaluate AI systems for dangerous risks and societal impacts. AI safety evaluations provide an early warning that models may possess excessively dangerous capabilities. While evaluations fall short of providing complete safety assurance, they are an important tool for risk mitigation. As the science of AI safety evaluations is still nascent, the global community has a stake in improving scientific rigor and sharing best practices so that countries around the world can institute appropriate and sufficiently robust evaluation measures. China’s Global AI Governance Initiative called for “a testing and assessment system based on AI risk levels” and the Bletchley Declaration articulated international support for safety testing and evaluation of frontier AI systems.
China already possesses advanced AI capabilities and substantial AI evaluations projects, so it has an important role to play in these conversations. We believe that this report and our new Chinese AI Safety Evaluations Database provide the first comprehensive analysis of these evaluations in English. We hope to facilitate mutual learning and engagement on AI safety evaluation best practices among leading Chinese and international institutions. We welcome engagement and outreach with other organizations interested in fostering internationally interoperable AI safety evaluation practices and standards.
In this paper, we first describe requirements around AI safety evaluations in Chinese AI governance. Next, we share our methodology for creating the Chinese AI Safety Evaluations Database. The database covers a range of safety and societal risks from advanced AI systems, but our analysis below focuses primarily on “frontier AI risks” given the greater need and potential for international cooperation on transnational and catastrophic threats. Then, we describe notable trends from the database, including which risks were mainly tested for, the type or methodology of evaluations, languages used, and modality. Lastly, we provide detailed descriptions of key government-supported, academic, and private research groups for AI safety evaluations.
Key takeaways:
- The Chinese government currently requires developers to conduct pre-deployment testing and evaluation of their AI systems for ideological orientation, discrimination, commercial violations, violations of individual rights, and application in higher risk domains. There are signs that this could expand in the future to incorporate testing for frontier or catastrophic AI safety risks.
- The risk areas that received the most testing by Chinese AI safety benchmarks are bias, privacy, robustness to adversarial and jailbreaking attacks, machine ethics, and misuse for cyberattacks.
- Chinese evaluations tested for all categories defined as frontier AI risks, with misuse for cyberattacks as the most tested frontier risk.
- Chinese AI safety evaluations primarily comprise static benchmarks, with a small number of open-source evaluation toolkits, agent evaluations, and domain red teaming efforts. Chinese institutions do not appear to have conducted human uplift evaluations.
- Shanghai AI Lab, Tianjin University NLP Lab, and Microsoft Research Asia Societal AI team are the only research groups that have published two or more frontier AI safety evaluations in China. However, many other government-backed, academic, and private industry research groups have also published evaluations covering a broad spectrum of AI safety and social impacts concerns.