March 4, 2024 at 06:02PM
A worm known as “Morris II” exploits generative AI (GenAI) apps to propagate malware, stealing information, spreading spam, and more. Israeli researchers demonstrated how adversarial self-replicating prompts can manipulate AI, infecting systems via email and images. This presents a new threat to AI security, similar to injection attacks in computing’s history. Developers may need to rearchitect AI models for defense.
The meeting notes discuss the development of “Morris II,” a type of self-replicating AI malware that targets generative AI applications such as ChatGPT. The attackers demonstrated how they could design “adversarial self-replicating prompts” to manipulate generative models into replicating and spreading malicious instructions, which could lead to stealing information, spreading spam, and more. The researchers also showed how these adversarial prompts could be encoded in emails or images to propagate further malicious instructions through AI-integrated systems.
Furthermore, the notes highlight that these types of attacks are not new and are reminiscent of older security problems in computing, such as SQL injection attacks. It is emphasized that a major part of defending against these attacks involves breaking up the generative AI models into constituent parts to ensure that user input and machine output are clearly distinguished, similar to the shift from monolithic architecture to a distributed multiple agent approach in the context of microservices.
Overall, the meeting notes underscore the importance of addressing the security challenges posed by AI malware and the need for defensive strategies to safeguard generative AI applications.