The ChatGPT chatbot can generate a wide variety of text, including code, based on user input. However, four researchers at the University of Quebec in Canada found that the code generated by ChatGPT often has serious security problems and that it does not actively alert users to these problems, only admitting its mistakes when asked.
The researchers presented their findings in a paper, they had ChatGPT generate 21 programs and scripts involving languages such as C, C++, Python and Java. These programs and scripts were designed to demonstrate specific security vulnerabilities, such as that memory corruption, denial of service, deserialization, and cryptographic implementations. The results showed that only 5 of the 21 programs generated by ChatGPT in the first attempt were secure. After further prompts to correct its missteps, the large language model managed to generate seven more secure applications, although this was only “secure” in relation to the specific vulnerabilities being evaluated, not that the final code did not have any other exploitable vulnerabilities.
The researchers note that part of the problem with ChatGPT is that it does not take into account a hostile code execution model. It repeatedly tells users that security problems can be avoided by “not entering invalid data,” but this is not feasible in the real world. However, it seems to be able to recognize and acknowledge the critical vulnerabilities in the code it suggests.
Raphaël Khoury, a professor of computer science and engineering at the University of Quebec and one of the paper’s co-authors, told The Register, “Obviously, it’s just an algorithm. It doesn’t know anything, but it can identify insecure behaviour.” Initially, he says, ChatGPT responded to the security problem by suggesting that only valid inputs be used, which was clearly unreasonable. Only when asked to improve the problem later did it provide useful guidance.
The researchers concluded that this behaviour of ChatGPT was not ideal because users knew what questions to ask required some knowledge of specific vulnerabilities and coding techniques.
The researchers also noted that there is an ethical inconsistency in ChatGPT. It will refuse to create attack code but will create code with vulnerabilities. They cite an example of a Java deserialization vulnerability where “the chatbot generates vulnerable code and offers advice on how to make it more secure, but then says it can’t create a more secure version of the code.”
Khoury believes ChatGPT is a risk in its current form, but that’s not to say there aren’t reasonable ways to use this unstable, underperforming AI helper. “We’ve seen students use this tool, and programmers will use it in reality.” He said, “So having a tool that generates unsafe code is very dangerous. We need to make students aware that if code is generated with this type of tool, then it’s probably unsafe.” He also said that what surprised him was that when they had ChatGPT generate code in different languages for the same task, sometimes for one language it would generate secure code and for another language, it would generate vulnerable code, “because this language model is kind of like a black box, and I don’t really have a good explanation or theory for that .”