GPT-4 “Self-Reflection” resulted in a 30% increase in test performance

03/04/2023

OpenAI’s newest language model, GPT-4, is not only capable of generating a variety of text like humans, but also of designing and executing tests to evaluate and improve its performance. This “reflection” technique has allowed GPT-4 to make significant progress in several difficult tests, improving performance by 30%.

GPT-4 is the most advanced system from OpenAI after GPT, GPT-2 and GPT-3, and is the largest multimodal model (which can accept image and text inputs and output text) available. Its use of deep learning techniques uses artificial neural networks to mimic human writing.

Researchers Noah Shinn and Ashwin Gopinath wrote in their paper, “We developed a novel technique that allows the AI agent to mimic human self-reflection and evaluate its own performance. GPT-4 adds some extra steps to complete various tests that allow it to design its own tests to check its own answers, identify errors and shortcomings, and then modify its own solutions based on the findings.”

On the HumanEval coding test, the GPT-4 increased its accuracy from 67% to 88% using a self-reflection loop

GPT-4 can critique its own performance by designing and executing tests that can significantly improve its performance as shown by the AlfWorld test results

The team used this technique to run several different performance tests on GPT-4. In the HumanEval test, GPT-4 was required to solve 164 never-before-seen Python programming problems with an accuracy rate of 67%, which improved to 88% using the reflection technique. In the Alfworld test, the AI was required to make decisions and solve multi-step tasks by performing a number of permissible actions in a variety of different interactive environments. Using reflection techniques, GPT-4 improved its accuracy from 73% to 97%, with only 4 tasks failing. In the HotPotQA test, GPT-4 was able to access Wikipedia and answer 100 questions that required parsing content and reasoning from multiple supporting documents with an accuracy rate of 34%, which increased to 54% using reflection techniques.

This study shows that solutions to AI problems sometimes rely on the AI itself. This is a bit like generative adversarial networks, a way for two AIs to improve each other’s skills, such as one AI trying to generate pictures that look like real pictures and the other AI trying to tell which ones are fake and which ones are real. But in this case, GPT is both a writer and an editor, improving the quality of its output through self-reflection.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

RELATED ARTICLES

Starting from 48,900, Geely Panda Karting officially starts pre-sale

Ford: Expand charging network, fuel/ hybrid/ pure electric in parallel

Chery’s two new cars are exposed, targeting overseas markets