MULTI-SPEAKER SPEECH SEPARATION

Invention Application

WO2017112466A1 MULTI-SPEAKER SPEECH SEPARATION 审中-公开

Title translation: 多音箱语音分离

Please log in to see more content

Patent Title: MULTI-SPEAKER SPEECH SEPARATION
Patent Title (中): 多音箱语音分离
Application No.: PCT/US2016/066430

Application Date: 2016-12-14
Publication No.: WO2017112466A1

Publication Date: 2017-06-29
Inventor: YU, Dong
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Applicant Address: Attn: Patent Group Docketing (Bldg. 8/1000) One Microsoft Way Redmond, Washington 98052-6399 US
Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
Current Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
Current Assignee Address: Attn: Patent Group Docketing (Bldg. 8/1000) One Microsoft Way Redmond, Washington 98052-6399 US
Agency: MINHAS, Sandip et al.
Priority: US14/976,078 20151221
Main IPC: G10L15/16
IPC: G10L15/16 ; G06N3/04 ; G10L15/07 ; G10L21/0272 ; G10L17/18

Abstract:

The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.

Abstract(Chinese):

这里描述的技术使用多输出层RNN来处理包括来自多个扬声器的语音的声学信号以跟踪个体说话者的语音。多输出层RNN具有多个输出层，每个输出层意味着跟踪一个扬声器（或噪声）并表示该扬声器（或噪声）的掩模。每个扬声器（或噪声）的输出层可以具有相同的尺寸，并且可以针对所有输出层中的每个输出单元进行归一化。多输出层RNN中的其余层在所有输出层之间共享。来自前一帧的结果被用作输出层或RNN的隐藏层之一的输入，以计算当前帧的结果。这种回传结果允许模型将来自先前帧的信息携带到未来帧以跟踪相同的说话者。

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络