in

#Microsoft achieves human parity in zero-shot text-to-speech with VALL-E 2

Microsoft’s VALL-E 2: First Time Human Parity in Zero-Shot Text-to-Speech Achieved

In recent years, advancements in speech synthesis technology have been driven by neural networks and end-to-end modeling. Microsoft introduced VALL-E, a neural codec language model that can synthesize high-quality personalized speech from a short recording of an unseen speaker, outperforming existing text-to-speech systems. A new paper presents VALL-E 2, which achieves human parity in zero-shot text-to-speech synthesis, marking a significant milestone. VALL-E 2 improves upon its predecessor with repetition-aware sampling and grouped code modeling, enhancing stability and performance in the speech synthesis process. The model requires simple training data, making it scalable and efficient. Experiments show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity, achieving human parity on benchmarks. The model consistently produces high-quality speech, even for complex or repetitive sentences. Demos of VALL-E 2 will be available online, and the paper detailing its advancements is accessible on arXiv. Overall, VALL-E 2 represents a significant leap forward in zero-shot text-to-speech synthesis, offering improved performance and scalability in speech synthesis technology.

Source link

Source link: https://syncedreview.com/2024/06/11/microsofts-vall-e-2-first-time-human-parity-in-zero-shot-text-to-speech-achieved/amp/

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

[Download] Mario Castelli — The AI Avatar Machine VIP Workshop | by Grace Courses | Jun, 2024

Mario Castelli’s AI Avatar Machine VIP Workshop download in 2024 #technology

Why OpenAI Is Getting Harder to Trust

The increasing difficulty in trusting OpenAI’s reliability #trustworthiness