Recently OpenAI announced the launch of a new version of ChatGPT, adding two new features: voice input and image input. According to OpenAI, the new features will be rolled out to ChatGPT Plus subscribers in the next two weeks, and others will be able to use these features “soon”.
The voice input feature is similar to a voice assistant on a cell phone, in that the user simply presses a button, says their question, and ChatGPT converts it to text, generates the answer, and then converts the answer to speech and plays it back to the user. openAI says this is a much more natural and convenient way of interacting with the user, and because of LLM’s technology, the answers will be of higher quality. openAI has also developed a new text-to-speech feature, which will be available to ChatGPT Plus subscribers “very soon”. OpenAI has also developed a new text-to-speech model that generates human voices based on a few seconds of sample speech. Users can choose from five options for ChatGPT’s voice, and there are more potential uses for this model. For example, OpenAI is working with Spotify to translate podcasts into other languages while preserving the voice of the podcast host. However, there are some risks associated with the model, such as the possibility that it could be used maliciously to impersonate public figures or commit fraud. As a result, OpenAI says the model will not be widely open, but will be strictly controlled and limited.
The image input function is similar to Google Lens, allowing users to take pictures of things they are interested in and upload them to ChatGPT, which tries to recognize what the user wants to ask and give them the appropriate answer. Users can also use the app’s drawing tools to help express their questions or communicate with voice or text input, and ChatGPT has the advantage of being able to have multiple conversations rather than a one-time search. If the user is not satisfied with the answer or wants more information, they can continue to ask ChatGPT questions to get a more accurate and comprehensive answer. Of course, there are some potential problems with image search. For example, when dealing with images of people, OpenAI says they have limited ChatGPT’s ability to analyze and directly evaluate people, both to ensure accuracy and to protect privacy, meaning that uploading a person’s photo to know who he/she is is not yet possible.
It is noted that since launching ChatGPT in early 2022, OpenAI has been working hard to add more features and capabilities to its bot while avoiding causing new problems to arise. With this update, the company is trying to find a balance on that line, by consciously limiting what its new models can do. But this approach isn’t a long-term solution, and as more and more people use voice control and image search, and as ChatGPT evolves into a truly multimodal and useful virtual assistant, it’s going to become increasingly difficult to maintain safe and sensible boundaries.