OpenAI Announces Open Source Multilingual Speech Recognition System Whisper

Although tech giants, including Google, Amazon and Meta, have each put their own powerful speech recognition systems at the heart of their software and services. But speech recognition remains a challenging topic in artificial intelligence and machine learning. The good news is that today OpenAI is solemnly announcing the open-source of Whisper – known as an automatic speech recognition system that officially claims that it can perform powerful transcriptions in multiple languages and translate them into English.

(Source: OpenAI Blog)

What makes Whisper different, OpenAI says, is that it took 680,000 hours of multilingual and “multitasking” training data collected from the network, improving the scheme’s ability to recognize unique accents, background noise and technical terms.

The overview on the official GitHub repository says:

"The primary target users of Whisper models are AI researchers who study the robustness, generalization, capability, bias, and constraints of current models.

At the same time, it is also suitable as an automatic speech recognition solution for developers, especially English speech recognition.

Interested friends can download multiple versions of the Whisper system from the hosting platform, and its models show strong ASR results in about 10 languages.

In addition, if fine-tuned on certain tasks, they are also expected to show additional capabilities in application scenarios such as voice activity detection and narrator classification."

Architecture

Unfortunately, Whisper has not been robustly evaluated in related fields, and the model has its limitations – in the field of text prediction.

Since the system was trained on a lot of “noisy” data, OpenAI decided to give everyone a shot in advance, warning that Whisper could include words in the transcription that weren’t actually spoken.

The reason may be that Whisper is both trying to predict the next word in the audio and trying to transcribe the audio itself.

Process example

In addition, Whisper’s performance in different language scenarios is also inconsistent, especially when it comes to narrators of languages that are not well represented in the training data, the error rate is also higher.

But the latter is nothing new in the field of speech recognition, and even the industry’s premier systems have been plagued by such biases.

Referring to the results of a study shared by Stanford University in 2020 – systems from Amazon, Apple, Google, IBM and Microsoft have a much lower error rate (about 35%) for white users compared to blacks.

About 1/3 of Whisper’s audio dataset is non-English

Even so, OpenAI believes that Whisper’s transcription capabilities can be used to improve existing accessibility tools. It wrote on GitHub:

"While the Whisper model is not suitable for real-time transcription out of the box, its speed and size suggest that others can build on it for near real-time speech recognition and translation applications.

Beneficial applications built on top of Whisper's models, whose value is a tangible indication of the different capabilities of these models, are expected to have real economic impact.

We hope that everyone will actively use this technology for beneficial purposes, making improvements to automatic speech recognition technology easier and enabling more participants to create more responsible projects.

With the dual benefits of speed and accuracy, Whisper will allow for an affordable automated transcription and translation experience for large volumes of communications."

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

TECHGOING

Read the News

Subscribe

Follow us

Most Viewed Content:

Google to bring PWA application backup & restore function for Chrome/android

OpenAI Launched Assistants API, Allowing Developers to Customize AI Assistants with One Click

Microsoft working on new features for Win11 / Win12: smart notifications, depth-of-field effects

TECHGOING

Read the News

Subscribe

Follow us

OpenAI Announces Open Source Multilingual Speech Recognition System Whisper

Latest

2024 Beijing Auto Show: All-new Toyota Crown unveiled at the booth

Teclast P50 Tablet released: Pre-installed with Android 14, Unisoc T606 processor

Nissan’s global sales in March were 365,845 units, YOY increase of 3.3%

Skyworth EV6 II super-charging car 2024 model launched: Starting from 139,800 RMB

Newsletter

Don't miss

2024 Beijing Auto Show: All-new Toyota Crown unveiled at the booth

Teclast P50 Tablet released: Pre-installed with Android 14, Unisoc T606 processor

Nissan’s global sales in March were 365,845 units, YOY increase of 3.3%

Skyworth EV6 II super-charging car 2024 model launched: Starting from 139,800 RMB

Kia Sonet SUV launched with optional L2 smart driving assistance

Kia Sonet SUV launched with optional L2 smart driving assistance

2.238 million new Aston Martin Vantage launched

Google sets up a new department to bring disruptive AI experiences to users

About us

Most recent

2024 Beijing Auto Show: All-new Toyota Crown unveiled at the booth

Teclast P50 Tablet released: Pre-installed with Android 14, Unisoc T606 processor

Nissan’s global sales in March were 365,845 units, YOY increase of 3.3%

Most popular

Google to bring PWA application backup & restore function for Chrome/android

OpenAI Launched Assistants API, Allowing Developers to Customize AI Assistants with One Click

Microsoft working on new features for Win11 / Win12: smart notifications, depth-of-field effects

Subscribe