StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

FromPapers Read on AI

Start listening View podcast show

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

FromPapers Read on AI

ratings:

Length:

41 minutes

Released:

Jun 20, 2024

Format:

Podcast episode

Description

Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing a double challenge of translation and policy. In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. Adhering to a multi-task learning approach, StreamSpeech can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an"All-in-One"seamless model. Experiments on CVSS benchmark demonstrate that StreamSpeech achieves state-of-the-art performance in both offline S2ST and Simul-S2ST tasks. Besides, StreamSpeech is able to present high-quality intermediate results (i.e., ASR or translation results) during simultaneous translation process, offering a more comprehensive real-time communication experience.

2024: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

https://arxiv.org/pdf/2406.03049

Released:

Jun 20, 2024

Format:

Podcast episode

Titles in the series (100)

Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Description

Titles in the series (100)

More Episodes from Papers Read on AI

Related podcast episodes