OpenAI President: GPT-4 isn't perfect but it's definitely different

Greg Brockman, co-founder and president of OpenAI, said in an interview that GPT-4 is not perfect, but it is definitely different.

GPT-4 builds on its predecessor, GPT-3, in many key ways, such as providing more realistic statements and allowing developers more control over their style and behaviour. In a sense, GPT-4 is also multimodal in that it can understand images, can add annotations to photos, and even describe in detail what is in them.

But GPT-4 also has serious flaws. Like GPT-3, the model can produce “illusions” (i.e., the model aggregates text that is irrelevant or inaccurate to the source text) and can make basic inference errors. Elvis Presley” as an “actor’s son” when in fact neither of his parents were actors.

When asked to compare GPT-4 to GPT-3, Brockman gave a four-word answer: different. He explained, “GPT-4 is definitely different, even though it still has a lot of problems and errors. But you can see the leap in its skills in subjects like calculus or law. It used to perform poorly in some areas and now it’s at a level that is above average.”

The test results support Brockman’s view. On the Advanced Placement Calculus exam, the GPT-4 scored a 4 out of 5, the GPT-3 scored a 1, and the GPT-3.5, which is somewhere between the GPT-3 and GPT-4, also scored a 4. On the mock bar exam, the GPT-4 scores were in the top 10%, while the GPT-3.5 scores hovered in the bottom 10%.

At the same time, the GPT-4 is more interesting because of the above-mentioned multi-modality. Unlike GPT-3 and GPT-3.5, which can only accept textual prompts, such as “write an essay about giraffes,” GPT-4 can accept both image and textual prompts to perform certain actions, such as identifying images of giraffes taken in the Serengeti and giving a basic description of their content.

This is because GPT-4 is trained on both image and text data, while its predecessor was trained on text only. Brockman declined when asked for details. Training data has also landed OpenAI in legal trouble before.

GPT-4’s image understanding capabilities are quite impressive. For example, the input prompt “What’s so funny about this image?” breaks down the entire image and correctly explains the joke’s punchline.

For now, only one partner has access to the image analysis capabilities of GPT-4, an assistive application for the visually impaired called Be My Eyes, and Brockman says that broader rollout will be “slow and deliberate” whenever OpenAI evaluates the risks and pros and cons.

He added, “There are policy issues that need to be addressed as well, such as facial recognition and how to handle images of people. We need to figure out where the danger zones are, where the red lines are, and then find solutions over time.”

OpenAI ran into a similar ethical dilemma with its text-to-image conversion system, Dall-E 2. After initially disabling the feature, OpenAI allowed customers to upload faces in order to edit them using an AI-powered image generation system. At the time, OpenAI claimed that upgrades to its security system made the face editing feature possible because it minimized the potential harm of deep forgery and attempts to create pornographic, political and violent content.

Another long-term issue is preventing GPT-4 from being used inadvertently in ways that could cause harm. Hours after the model was released, Adversa AI, an Israeli cybersecurity startup, published a blog post demonstrating ways to bypass OpenAI’s content filters and have GPT-4 generate phishing emails, offensive depictions of homosexuals, and other objectionable text.

This is not a new problem in the field of language modeling; Facebook parent company Meta’s chatbot BlenderBot and OpenAI’s ChatGPT have also been tempted to output inappropriate content, even revealing sensitive details about their inner workings. But many people, including journalists, had hoped that GPT-4 might bring significant improvements in this area.

When asked about the robustness of GPT-4, Brockman emphasized that the model had undergone six months of security training. In internal tests, it was 82 percent less likely than GPT-3.5 to respond to requests for content not allowed by OpenAI’s usage policy, and 40 percent more likely than GPT-3.5 to produce a “factual” response.

We’ve spent a lot of time trying to understand the capabilities of GPT-4,” Brockman said. We are continually updating it, including a number of improvements, so that the model is more scalable to the personalities or patterns that people want it to have.”

Frankly, the results of the early reality tests weren’t all that satisfying. In addition to the Adversa AI tests, Microsoft’s chatbot Bing Chat proved to be very easy to jailbreak. Using carefully designed input, users were able to get the chatbot to express love, issue threats of harm, defend the Holocaust and invent conspiracy theories.

Brockman did not deny GPT-4’s shortcomings in this area, but he highlighted the model’s new restrictive tools, including API-level functionality known as “system” messages. System messages are essentially instructions that set the tone and establish the boundaries for GPT-4 interactions. For example, a system message might read, “You are a tutor who always answers questions in a Socratic style. You never give your students answers, but always try to ask the right questions to help them learn to think independently.”

The idea is that the system message acts as a guardrail to keep the GPT-4 from straying off track. Brockman says, “Really figuring out the tone, style and substance of the GPT-4 has been a big concern for us. I think we’re starting to learn more about how to engineer it and how to have a repeatable process that gives you predictable results that are really useful to people.”

Brockman also mentioned Evals, OpenAI’s latest open source software framework for evaluating the performance of its AI models, as a sign of OpenAI’s commitment to “enhancing” its models. This is a crowdsourced approach to model testing.

With Evals, we can better see the use cases that users care about and can test them,” Brockman said. We open-sourced this framework in part because we no longer release a new model every three months for continuous improvement. You don’t make something you can’t measure, do you? But as we roll out new versions of the model, we can at least know what changes have taken place.”

Brockman was also asked if OpenAI would compensate people for testing its models with Evals? He wouldn’t commit to that, but he did note that for a limited time, OpenAI is allowing Eevals users who request it to have early access to the GPT-4 API.

Brockman also talked about GPT-4’s contextual window, which refers to the text that the model can consider before generating additional text. 5 times the “memory” of a normal GPT-4 and 8 times that of a GPT-3.

Brockman believes that the extended context window will lead to new, previously unexplored use cases, especially in the enterprise. He envisions an AI chatbot built for companies that can leverage information from different sources

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

TECHGOING

Read the News

Subscribe

Follow us

Most Viewed Content:

OpenAI Launched Assistants API, Allowing Developers to Customize AI Assistants with One Click

India’s censorship body gave power to remove pirated Movies from platforms

Google to bring PWA application backup & restore function for Chrome/android

TECHGOING

Read the News

Subscribe

Follow us

OpenAI President: GPT-4 isn’t perfect but it’s definitely different

Latest

Pony.ai Unveils Seventh-Generation Pure Electric Robotaxi Concept Car

ASE secures exclusive order for capacitive button SiP modules for Apple’s iPhone 16 Series

Hongqi EHS7 is about to debut at the 2024 Beijing Auto Show

2024 Beijing Auto Show Tour: Porsche Macan EV

Newsletter

Don't miss

Pony.ai Unveils Seventh-Generation Pure Electric Robotaxi Concept Car

ASE secures exclusive order for capacitive button SiP modules for Apple’s iPhone 16 Series

Hongqi EHS7 is about to debut at the 2024 Beijing Auto Show

2024 Beijing Auto Show Tour: Porsche Macan EV

New Aito M5 will be launched on the April 23rd, with multiple configuration upgrades

BYD’s first new energy pickup truck officially named BYD Shark

Vivo Y200i phone released with Snapdragon 4 Gen 2 chip, 6000mAh battery

Tesla still plans to build 1800 miles of U.S. charging corridors for Semi Truck project

About us

Most recent

Pony.ai Unveils Seventh-Generation Pure Electric Robotaxi Concept Car

ASE secures exclusive order for capacitive button SiP modules for Apple’s iPhone 16 Series

Hongqi EHS7 is about to debut at the 2024 Beijing Auto Show

Most popular

OpenAI Launched Assistants API, Allowing Developers to Customize AI Assistants with One Click

India’s censorship body gave power to remove pirated Movies from platforms

Google to bring PWA application backup & restore function for Chrome/android

Subscribe