March 13, 2024 at 04:38AM
Researchers have successfully uncovered hidden aspects of transformer models in OpenAI and Google through an attack that illuminates a portion of these “black box” models. The attack, accomplished for a range of costs and analyzed by a team of computer scientists, has prompted recommendations to regulate the release of advanced AI models to protect against security threats.
During the meeting, the group discussed an attack that reveals the embedding projection layer of transformer models through API queries, shedding light on hidden dimensions and sizes of models used by OpenAI and Google. The research, authored by 13 computer scientists from various institutions, includes the disclosure of the attack and the steps taken by the companies to mitigate it. The attack, while not completely exposing the model, offers critical insights into the model’s capabilities and has prompted concern about information replication and national security threats. As a response, the report “Defense in Depth: An Action Plan to Increase the Safety and Security of Advanced AI” recommends exploring restrictions on the open-access release or sale of advanced AI models and enacting security measures to protect critical intellectual property, such as model weights. Furthermore, it advises tracking high-level usage patterns and developing sophisticated countermeasures to identify attempts to reconstruct model parameters using these approaches.