OpenAI posted a post detailing its approach to ensuring AI security, including conducting security assessments, improving post-release safeguards, protecting children, and respecting privacy. The company said ensuring that AI systems are built, deployed and used safely is critical to achieving its mission.
The following is the full text of the OpenAI post:
OpenAI is committed to ensuring strong AI security for the benefit of as many people as possible. We know that our AI tools provide a lot of help to people today. Users around the world have given us feedback that ChatGPT helps make them more productive, enhances their creativity, and provides a tailored learning experience. But we also recognize that, as with any technology, there are real risks associated with these tools. Therefore, we are working to ensure security at all system levels.
Building a more secure AI system
Before launching any new AI system, we conduct rigorous testing, seek input from external experts, and improve the performance of our models through techniques such as reinforcement learning with manual feedback. We also have extensive security and monitoring systems in place.
Our latest model, GPT-4, for example, was tested company-wide for up to six months after completing training to ensure it was more secure and reliable before it was publicly released.
We believe that robust AI systems should be subject to rigorous security assessments. Regulation is necessary to ensure that such practices are widely adopted. Therefore, we are actively engaging with governments to explore the best form of regulation.
Learning from real-world use to improve safeguards
We do our best to prevent foreseeable risks before systems are deployed, but learning in the lab is always limited. We study and test extensively, but cannot predict how people will use our technology, or misuse it. Therefore, we believe that learning from real-world use is a key component of creating and releasing increasingly secure AI systems.
We are careful to release new AI systems to the population incrementally, with substantial safeguards and continuous improvements based on the lessons we learn.
We provide the most robust models from our own services and APIs so that developers can integrate the technology directly into their applications. This allows us to monitor and act on abusive behavior while developing countermeasures. This allows us to take real action, not just imagine how to respond in theory.
Experience in actual use has also led us to develop increasingly granular policies to address behaviors that pose a real risk to people, while still allowing our technology to be used in more beneficial ways.
We believe that society needs more time to adapt to increasingly powerful AI, and that everyone affected by it should have a say in its further development. Iterative deployment helps different stakeholders engage more effectively in conversations about AI technologies, and having first-hand experience using these tools is critical.
One of the main focuses of our security efforts is the protection of children. We require that people using our AI tools be 18 years of age or older, or 13 years of age or older with parental consent. We are currently working on validation features.
We do not allow our technology to be used to generate content that is hateful, harassing, violent or adult. Compared to GPT-3.5, the latest GPT-4 is 82% less likely to respond to requests for restricted content. We have built robust systems to monitor abuse. GPT-4 is now available to ChatGPT Plus subscribers, and we hope to make it more accessible to more people over time.
We have taken significant steps to minimize the likelihood that our model will produce content that harms children. For example, when a user tries to upload child safety abuse material to our image generation tool, we block it and report the matter to the National Center for Missing and Exploited Children.
In addition to default safeguards, we have partnered with development organizations such as the nonprofit Khan Academy to create customized security measures. Khan Academy has developed an artificial intelligence assistant that can serve as a virtual mentor for students and as a classroom assistant for teachers. We are also working on features that will allow developers to set stricter standards for model output to better support developers and users who need such features.
Respect for privacy
Our large language models are trained on a broad corpus of text, including publicly available content, licensed content, and content generated by human reviewers. We do not use this data to sell our services or advertising, nor do we use it to build personal profiles. We simply use this data to make our models better for people, for example by having more conversations with people to improve ChatGPT’s intelligence.
While much of our training data includes personal information that is available on the public web, we want our models to understand the world as a whole, not individuals. Therefore, we work to remove personal information from the training dataset where feasible, fine-tune our model to reject query requests for personal information and respond to requests from individuals to remove their personal information from our system. These measures minimize the likelihood that our models will generate responses that contain personal information.
Improving Factual Accuracy
Today’s large language models, based on previous patterns and the text entered by the user, can predict the next word that may be used. In some cases, however, the next most likely word may actually be factually incorrect.
Improving factual accuracy is one of the main focuses of OpenAI and many other AI research organizations, and we are making progress. By using user feedback from ChatGPT outputs that were flagged as incorrect as a primary source of data, we have improved the factual accuracy of GPT-4. Compared to GPT-3.5, GPT-4 is more likely to produce factually accurate content, an improvement of 40%.
We strive to be as transparent as possible when users sign up for the tool to avoid the possibility of incorrect responses from ChatGPT. However, we have recognized that there is still much work to be done to further reduce the potential for misinterpretation and to educate the public about the current limitations of these AI tools.
Ongoing Research and Engagement
We believe that a practical way to address AI security is to invest more time and resources in researching effective mitigation and calibration techniques and testing them against real-world situations where they could be misused.
Importantly, we believe that improving the security and capabilities of AI should go hand in hand. Our best security work to date has come from working with our most capable models because they are better at following user instructions and are easier to navigate or “bootstrap”.
We will be increasingly careful to create and deploy more capable models and will continue to enhance security precautions as our AI systems evolve.
While we have waited more than six months to deploy GPT-4 to better understand its capabilities, benefits, and risks, it can sometimes take longer to improve the security of AI systems. As a result, policymakers and AI developers need to ensure that AI development and deployment is effectively regulated globally so that no one takes shortcuts to stay ahead of the curve. This is a daunting challenge that will require technical and institutional innovation, but one to which we are eager to contribute.
Addressing AI security will also require extensive debate, experimentation, and engagement, including setting boundaries for the behaviour of AI systems. We have and will continue to foster collaboration and open dialogue among stakeholders to create a more secure AI ecosystem.