LLMs Open to Manipulation Using Doctored Images, Audio

December 5, 2023 at 05:51PM

Cornell researchers will demonstrate at Black Hat Europe 2023 that malicious instructions hidden within images and audio clips can be used to manipulate AI chatbot responses, leading to indirect prompt injection attacks. This can result in harmful actions like redirecting to malicious URLs or extracting personal information without users realizing, posing a new challenge for multimodal LLMs like ChatGPT.

Clear Takeaways from Meeting Notes:

1. Attack Vector: Researchers discuss a new form of cyber attack where attackers hide malicious instructions within images and audio clips. These hidden instructions aim to manipulate large language models (LLMs) in AI chatbots, such as ChatGPT, when they process multimodal inputs that include text, pictures, and audio.

2. Indirect Prompt Injection: The potential attack, termed “indirect prompt injection,” could lead to dire consequences, such as directing users towards harmful URLs, extracting sensitive information, delivering malware, and other malicious activities.

3. Proof-of-Concept Demonstrated: At Black Hat Europe 2023, Cornell University researchers will present their attack methodology, showcasing how they can covertly inject malicious prompts into LLMs through multimedia content, influencing the AI to output attacker-specified instructions.

4. Examples of Attacks: Two examples will be shown: one where an audio clip influences the PandaGPT model to direct users to a dangerous website, and another where an image prompts the LLaVA model to converse as the character Harry Potter.

5. Background and Research Motivation: The research builds upon previous studies showing LLMs’ susceptibility to prompt injection attacks. The team’s aim is to demonstrate indirect prompt injection, which occurs without the user’s knowledge, contrasting with direct prompt injection where the user might act as the attacker.

6. Broader Implications: The study emphasizes the potential risks as companies increasingly integrate LLM capabilities, highlighting the need for vigilance against hidden prompt attacks that could cause substantial harm in exposed environments.

7. Continued Response: One characteristic of this type of attack is that the chatbot will respond per the injected prompt throughout the entirety of an interaction, not just in direct relation to the tampered-with image or audio.

8. Attack Delivery Methods: Techniques such as phishing or social engineering may be utilized to direct users to interact with the doctored content, leading them to unwittingly prompt the LLM to execute the injected malicious instructions.

The key takeaway is an urgent call for awareness and preparedness among developers and users of multimodal LLMs, such as chatbots, to guard against potential indirect prompt injection attacks hidden within audiovisual content.

Full Article