Speaker diartzation using an end-to-end model

Invention Grant

US11545157B2 Speaker diartzation using an end-to-end model 有权

Please log in to see more content

Patent Title: Speaker diartzation using an end-to-end model
Application No.: US16617219

Application Date: 2019-04-15
Publication No.: US11545157B2

Publication Date: 2023-01-03
Inventor: Quan Wang , Yash Sheth , Ignacio Lopez Moreno , Li Wan
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Middleton Reutlinger
International Application: PCT/US2019/027519 WO 20190415
International Announcement: WO2019/209569 WO 20191031
Main IPC: G10L17/18
IPC: G10L17/18 ; G10L15/26 ; G10L17/04 ; G10L21/0216 ; G06K9/62 ; G10L15/16 ; G10L17/00

Speaker diartzation using an end-to-end model

Abstract:

Techniques are described for training and/or utilizing an end-to-end speaker diarization model. In various implementations, the model is a recurrent neural network (RNN) model, such as an RNN model that includes at least one memory layer, such as a long short-term memory (LSTM) layer. Audio features of audio data can be applied as input to an end-to-end speaker diarization model trained according to implementations disclosed herein, and the model utilized to process the audio features to generate, as direct output over the model, speaker diarization results. Further, the end-to-end speaker diarization model can be a sequence-to-sequence model, where the sequence can have variable length. Accordingly, the model can be utilized to generate speaker diarization results for any of various length audio segments.

Public/Granted literature

US20200152207A1 SPEAKER DIARIZATION USING AN END-TO-END MODEL Public/Granted day:2020-05-14

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证
G10L17/18	.人工神经网络，连接方法