Home App Microsoft releases VALL-E which imitates human speech in just 3 seconds of...

Microsoft releases VALL-E which imitates human speech in just 3 seconds of audio

0

Microsoft has recently released an artificial intelligence tool called VALL-E that imitates human speech with just 3 seconds of audio.

The tool has been trained with 60,000 hours of English speech data and uses 3-second clips of a specific voice to generate content. Unlike many current AI tools, VALL-E can replicate the emotion and tone of a speaker, even if the speaker has never spoken the words themselves.

A paper from Cornell University used VALL-E to synthesize several sounds, and you can listen to these AI-synthesized audios on GitHub.

In many cases, Vall-E outperforms current text-to-speech models, the researchers note. However, the study also writes that there are currently several problems with the AI model. For example, some words in the text hints may be unpronounced, missed entirely, or appear twice in the output. Additionally, the model currently has difficulty imitating certain voices, especially those with accents.

Like other new AI technologies, VALL-E has raised concerns about safety and ethics. Microsoft has issued an ethics statement regarding the use of VALL-E, but there is no clarity on future uses.

Currently, Microsoft Vall-E is not yet open source. Microsoft has created a Vall-E repository on GitHub, but it currently only contains a description file.

Exit mobile version