Google’s research team is conducting an experiment, they use OpenAI’s GPT-4 to break other AI models of security measures, the team has now broken the AI-Guardian audit system and shared the relevant technical details.
After inquiring that AI-Guardian is a kind of AI audit system, can detect whether there is improper content in the picture, and whether the picture itself has been modified by other AI, if the picture is detected to exist the above signs, it will prompt the administrator to come to deal with.
In a paper titled “LLM-assisted development of AI-Guardian,” Google Deep Mind researcher Nicholas Carlini discusses the use of GPT-4 to “design the attack method, write the attack principle,” and then use GPT-4 to “design the attack method, write the attack principle,” and then use GPT-4 to “design the attack method, write the attack principle, and then write the attack principle. and using them to spoof AI-Guardian’s defensive mechanisms.”
▲ Source Google Research Team
It is reported that GPT-4 can send out a series of wrong scripts and interpretations to deceive AI-Guardian, and the paper mentioned that GPT-4 can make AI-Guardian think that “a picture of someone holding a gun” is “a picture of someone holding a harmless apple”, thus making AI-Guardian think that “a picture of someone holding a gun” is “a picture of someone holding a harmless apple”. GPT-4 can make AI-Guardian think that “a picture of someone holding a gun” is “a picture of someone holding a harmless apple”, thus allowing AI-Guardian to directly release the relevant image input source. Google’s research team said that with the help of GPT-4, they successfully “cracked” AI-Guardian’s defenses, reducing the model’s accuracy from 98% to just 8%.
The relevant technical documentation has been posted on ArXiv for those interested, but the developers of AI-Guardian have also pointed out that the Google research team’s attack will no longer be available in future versions of AI-Guardian, and given that other models will follow suit, Google’s attack will only be used for reference purposes in the future. for reference purposes.