Home News GPT-4 “Self-Reflection” resulted in a 30% increase in test performance

GPT-4 “Self-Reflection” resulted in a 30% increase in test performance

0

OpenAI’s newest language model, GPT-4, is not only capable of generating a variety of text like humans, but also of designing and executing tests to evaluate and improve its performance. This “reflection” technique has allowed GPT-4 to make significant progress in several difficult tests, improving performance by 30%.

GPT-4 is the most advanced system from OpenAI after GPT, GPT-2 and GPT-3, and is the largest multimodal model (which can accept image and text inputs and output text) available. Its use of deep learning techniques uses artificial neural networks to mimic human writing.

Researchers Noah Shinn and Ashwin Gopinath wrote in their paper, “We developed a novel technique that allows the AI agent to mimic human self-reflection and evaluate its own performance. GPT-4 adds some extra steps to complete various tests that allow it to design its own tests to check its own answers, identify errors and shortcomings, and then modify its own solutions based on the findings.”

On the HumanEval coding test, the GPT-4 increased its accuracy from 67% to 88% using a self-reflection loop

GPT-4 can critique its own performance by designing and executing tests that can significantly improve its performance as shown by the AlfWorld test results

The team used this technique to run several different performance tests on GPT-4. In the HumanEval test, GPT-4 was required to solve 164 never-before-seen Python programming problems with an accuracy rate of 67%, which improved to 88% using the reflection technique. In the Alfworld test, the AI was required to make decisions and solve multi-step tasks by performing a number of permissible actions in a variety of different interactive environments. Using reflection techniques, GPT-4 improved its accuracy from 73% to 97%, with only 4 tasks failing. In the HotPotQA test, GPT-4 was able to access Wikipedia and answer 100 questions that required parsing content and reasoning from multiple supporting documents with an accuracy rate of 34%, which increased to 54% using reflection techniques.

This study shows that solutions to AI problems sometimes rely on the AI itself. This is a bit like generative adversarial networks, a way for two AIs to improve each other’s skills, such as one AI trying to generate pictures that look like real pictures and the other AI trying to tell which ones are fake and which ones are real. But in this case, GPT is both a writer and an editor, improving the quality of its output through self-reflection.

Exit mobile version