Invention Application
- Patent Title: MULTI-SPEAKER SPEECH SEPARATION
- Patent Title (中): 多音箱语音分离
-
Application No.: PCT/US2016/066430Application Date: 2016-12-14
-
Publication No.: WO2017112466A1Publication Date: 2017-06-29
- Inventor: YU, Dong
- Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
- Applicant Address: Attn: Patent Group Docketing (Bldg. 8/1000) One Microsoft Way Redmond, Washington 98052-6399 US
- Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
- Current Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
- Current Assignee Address: Attn: Patent Group Docketing (Bldg. 8/1000) One Microsoft Way Redmond, Washington 98052-6399 US
- Agency: MINHAS, Sandip et al.
- Priority: US14/976,078 20151221
- Main IPC: G10L15/16
- IPC: G10L15/16 ; G06N3/04 ; G10L15/07 ; G10L21/0272 ; G10L17/18
Abstract:
The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.
Information query