#ModelRisk – Xynik

Researchers Show How to Use One LLM to Jailbreak Another

December 7, 2023 by Xynik

December 7, 2023 at 03:52PM Researchers at Robust Intelligence and Yale University developed Tree of Attacks with Pruning (TAP), a method to prompt “aligned” large language models (LLMs) into producing harmful content. They demonstrated success in “jailbreaking” LLMs like GPT-4, bypassing safety guardrails using an “unaligned” model to iteratively refine prompts. This poses potential risks … Read more