Injecting Text in Self-Supervised Speech Pre-training

Invention Application

US20230017892A1 Injecting Text in Self-Supervised Speech Pre-training 有权

Please log in to see more content

Patent Title: Injecting Text in Self-Supervised Speech Pre-training
Application No.: US17808091

Application Date: 2022-06-21
Publication No.: US20230017892A1

Publication Date: 2023-01-19
Inventor: Zhehuai Chen , Bhuvana Ramabhadran , Andrew M. Rosenberg , Yu Zhang , Pedro J. Moreno Mengibar
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G10L13/047
IPC: G10L13/047 ; G10L13/08

Injecting Text in Self-Supervised Speech Pre-training

Abstract:

A method includes receiving training data that includes unspoken text utterances and un-transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances and the un-transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

Public/Granted literature

US12159617B2 Injecting text in self-supervised speech pre-training Public/Granted day:2024-12-03

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/02	.产生合成语音的方法；语音合成设备
G10L13/04	..语音合成系统的零部件，例如合成设备结构或存储器管理
G10L13/047	...语音合成设备的体系结构