AI About-Face: ‘Mantis’ Turns LLM Attackers Into Prey

AI About-Face: 'Mantis' Turns LLM Attackers Into Prey

November 19, 2024 at 06:35AM

A new defensive system, Mantis, has been developed to counter cyberattacks by large-language models (LLMs). It uses deceptive techniques to mislead attackers, embedding prompt-injection commands within responses. Mantis has shown a success rate exceeding 95% in redirecting and thwarting LLM-based exploits using active and passive defense strategies.

### Meeting Takeaways

**Overview:**
– Companies face threats from cyberattackers utilizing large-language models (LLMs) and generative AI systems to exploit vulnerabilities in their systems. A new defensive system called Mantis has been developed to counter these automated threats.

**Key Points about Mantis:**
– **Functionality:** Mantis uses deceptive techniques to simulate targeted services. When it identifies automated attackers, it can send back a prompt-injection attack to mislead the attacking AI while remaining invisible to legitimate users.
– **Mechanism:** It operates by embedding prompt-injection commands within responses sent to the attacking LLM, influencing its actions and disrupt its attack strategies.

**Research Insights:**
– LLMs can be easily co-opted due to their ‘greedy’ approach when targeting.
– Current research on both offensive and defensive LLM use is still in early stages, with existing attacks being automated rather than fundamentally new.

**Types of Attacks:**
1. **Direct Prompt Injection:** Involves entering commands directly into LLM interfaces (e.g., chatbots).
2. **Indirect Prompt Injection:** Commands are included in documents or data that LLMs process.

**Defense Strategies:**
– Mantis utilizes two approaches:
– **Passive Defense:** Slow down attackers and increase costs associated with their actions.
– **Active Defense:** Conduct counterattacks to gain control over the attacker’s systems.
– Both strategies demonstrated over 95% success using prompt injections.

**Challenges for Attackers:**
– Attackers attempting to reinforce their LLMs against such exploits face significant difficulty in addressing prompt-injection vulnerabilities. Current solutions may require human involvement, which undermines the efficiency benefits provided by LLMs.

**Conclusion:**
– As prompt-injection attacks remain viable threats, systems like Mantis will continue to transform attacking AIs into targets, highlighting an ongoing arms race in the realm of AI-driven cybersecurity.

Full Article