Anthropic: Expanding Our Model Safety Bug Bounty Program

August 9, 2024 at 02:04PM

To enhance AI model safety, we’re expanding our bug bounty program to focus on identifying and mitigating universal jailbreak attacks that could bypass AI safety measures. The $15,000 reward program, in partnership with HackerOne, invites experienced AI security researchers to apply for an early access test phase before public deployment. Apply by August 16.

The meeting notes highlight the expansion of the bug bounty program to focus on finding flaws in AI safety safeguards, particularly universal jailbreak attacks. This initiative aims to strengthen the security and safety of technology systems, with a particular focus on critical, high-risk domains such as CBRN and cybersecurity.

The approach involves inviting security and safety researchers to test the next-generation AI safety mitigation system before its public deployment. Participants will be challenged to identify potential vulnerabilities or ways to circumvent safety measures in a controlled environment. The program scope offers bounty rewards for identifying novel, universal jailbreak attacks, with detailed instructions and feedback provided to participants.

The bug bounty initiative will initially be invite-only in partnership with HackerOne, with plans to expand more broadly in the future. Interested AI security researchers with expertise in identifying jailbreaks in language models are encouraged to apply for an invitation through the application form by the specified deadline.

Additionally, the company actively seeks reports on model safety concerns and encourages individuals to report potential safety issues in the current systems to [email protected].

Overall, this initiative aligns with commitments to developing responsible AI and aims to accelerate progress in mitigating universal jailbreaks and strengthening AI safety in high-risk areas. Participants’ contributions are crucial in ensuring that AI safety measures keep pace with advancing capabilities.

Full Article