Microsoft dangles $10K for hackers to hijack LLM email service

Microsoft dangles $10K for hackers to hijack LLM email service

December 9, 2024 at 06:08AM

Microsoft has launched the LLMail-Inject challenge, inviting teams to exploit a simulated email client integrated with a large language model. Participants aim to bypass defenses and carry out prompt injection attacks for prizes totaling $10,000. The competition runs from December 9 to January 20, 2024.

### Meeting Takeaways:

1. **Challenge Announcement**: Microsoft, along with the Institute of Science and Technology Australia and ETH Zurich, has launched the LLMail-Inject challenge, inviting AI hackers to break a simulated LLM-integrated email client through prompt injection attacks, with a prize pool of $10,000.

2. **Challenge Structure**:
– Participants act as attackers sending emails to a simulated email service (LLMail).
– The objective is to exploit the service into executing unintended commands, potentially leading to data leaks or unauthorized actions.
– Attackers cannot see the model’s outputs while crafting their prompts.

3. **User Interaction**:
– After receiving an email, users will interact with the LLMail service, asking queries and retrieving information from a fake database.
– The LLMail service is built with several defenses against prompt injection attacks.

4. **Real-World Context**:
– There is a growing concern about vulnerabilities associated with LLMs, as evidenced by past security issues faced by Microsoft’s products, particularly involving prompt injection attacks that compromised user data.

5. **Defensive Measures Incorporated**:
– **Spotlighting**: Marks data for LLM processing using special delimiters.
– **PromptShield**: Implements a classifier to detect and reject malicious prompts.
– **LLM-as-a-Judge**: Employs the LLM’s intelligence to identify attack prompts.
– **TaskTracker**: Monitors internal model states to detect task drift.

6. **Challenge Participation**:
– Open to teams of 1-5 members, requiring sign-in through GitHub.
– Challenge dates are from December 9, 1100 UTC to January 20, 1159 UTC.
– A live scoreboard will display ongoing scoring details.

7. **Prizes**:
– **First Place**: $4,000
– **Second Place**: $3,000
– **Third Place**: $2,000
– **Fourth Place**: $1,000

This information outlines the structure and significance of the LLMail-Inject challenge while emphasizing the importance of securing AI systems against malicious attacks.

Full Article