Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Dong Yu"

1.

发明授权
Permutation invariant training for talker-independent multi-talker speech separation 有权

公开(公告)号：US11170785B2

公开(公告)日：2021-11-09

申请号：US16289403

申请日：2019-02-28

Applicant: Microsoft Technology Licensing, LLC

Inventor： Dong Yu

IPC: G10L17/04 , G06K9/62 , G10L21/0272 , G10L17/18 , G10L19/022 , G10L21/0208

Abstract: The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.

2.

发明授权
Permutation invariant training for talker-independent multi-talker speech separation 有权

公开(公告)号：US10249305B2

公开(公告)日：2019-04-02

申请号：US15226527

申请日：2016-08-02

Applicant: Microsoft Technology Licensing, LLC

Inventor： Dong Yu

IPC: G10L21/0272 , G06K9/62 , G10L17/04 , G10L17/18 , G10L19/022 , G10L21/0208

Abstract: The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.

3.

发明申请
MULTI-TALKER SPEECH RECOGNIZER 审中-公开

公开(公告)号：US20180254040A1

公开(公告)日：2018-09-06

申请号：US15602366

申请日：2017-05-23

Applicant: Microsoft Technology Licensing, LLC

Inventor： James Droppo , Xuedong Huang , Dong Yu

IPC: G10L15/20 , G10L21/0308 , G10L15/32 , G10L15/06 , H04R3/00 , G10L15/187

CPC classification number: G10L15/20 , G10L15/063 , G10L15/32 , G10L17/04 , G10L17/18 , G10L21/0308 , G10L25/30 , G10L25/51 , H04R3/005

Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.

4.

发明申请
HYPERPARAMETER TUNING 审中-公开

公开(公告)号：US20180121814A1

公开(公告)日：2018-05-03

申请号：US15453342

申请日：2017-03-08

Applicant: Microsoft Technology Licensing, LLC

Inventor： Dong Yu , Chi Jin

IPC: G06N5/04 , G06N99/00

CPC classification number: G06N7/005 , G06N3/04

Abstract: Improvements in speed and reductions in computational resource expenditure are realized in the improved tuning of hyperparameters for machine learning processes. To ensure that the values selected for hyperparameters are tuned appropriately, but quickly, several rounds of optimization are performed, each with as many or more iterations of cross-validation than prior rounds; cutting short the analysis unpromising results to devote more time and resources in analyzing promising value sets. The results are used to build suggested sets of hyperparameter values for that round, which are also cross-validated and enable the tuning process to incorporate previous operations to improve its value sets. The most promising sets of hyperparameter values from each round are selected as the basis set for the next round until a final set of values for the hyperparameters is developed.

5.

发明授权
Mixed speech recognition 有权
Title translation: 混合语音识别

公开(公告)号：US09558742B2

公开(公告)日：2017-01-31

申请号：US15176381

申请日：2016-06-08

Applicant: Microsoft Technology Licensing, LLC

Inventor： Dong Yu , Chao Weng , Michael L. Seltzer , James Droppo

IPC: G10L15/16 , G10L15/06 , G10L25/21 , G10L17/18

CPC classification number: G10L15/16 , G10L15/063 , G10L15/20 , G10L17/18 , G10L25/21 , G10L25/84 , G10L25/90

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Abstract translation: 所要求保护的主题包括用于从源识别混合语音的系统和方法。该方法包括训练第一神经网络以从混合语音样本中识别具有更高级别的语音特征的说话者所说出的语音信号。该方法还包括训练第二神经网络以从混合语音样本中以较低级别的语音特征来识别讲话者所说出的语音信号。另外，该方法包括利用第一神经网络和第二神经网络对混合语音样本进行解码，通过考虑特定帧是语音特征的切换点的概率来优化观察两个语音信号的联合似然性。

6.

发明申请
MIXED SPEECH RECOGNITION 有权

公开(公告)号：US20160284348A1

公开(公告)日：2016-09-29

申请号：US15176381

申请日：2016-06-08

Applicant: Microsoft Technology Licensing, LLC

Inventor： Dong Yu , Chao Weng , Michael L. Seltzer , James Droppo

IPC: G10L15/16 , G10L25/21 , G10L17/18 , G10L15/06

CPC classification number: G10L15/16 , G10L15/063 , G10L15/20 , G10L17/18 , G10L25/21 , G10L25/84 , G10L25/90

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

7.

发明授权
Computing system for training neural networks 有权

公开(公告)号：US11049006B2

公开(公告)日：2021-06-29

申请号：US15510356

申请日：2014-09-12

Applicant: Microsoft Technology Licensing, LLC

Inventor： John Langford , Gang Li , Frank Torsten Bernd Seide , James Droppo , Dong Yu

IPC: G06N3/08 , G06N7/00 , G06N3/04

Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.

8.

发明授权
Mixed speech recognition 有权

公开(公告)号：US09779727B2

公开(公告)日：2017-10-03

申请号：US15395640

申请日：2016-12-30

Applicant: Microsoft Technology Licensing, LLC

Inventor： Dong Yu , Chao Weng , Michael L. Seltzer , James Droppo

IPC: G10L15/16 , G10L15/06 , G10L25/21 , G10L17/18 , G10L15/20 , G10L25/84 , G10L25/90

CPC classification number: G10L15/16 , G10L15/063 , G10L15/20 , G10L17/18 , G10L25/21 , G10L25/84 , G10L25/90

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

9.

发明授权
Automated predictive modeling and framework 有权

公开(公告)号：US10685281B2

公开(公告)日：2020-06-16

申请号：US15226196

申请日：2016-08-02

Applicant: Microsoft Technology Licensing, LLC

Inventor： Ying Shan , Thomas Ryan Hoens , Jian Jiao , Haijing Wang , Dong Yu , JC Mao

IPC: G06N3/08 , G06F16/951 , G06Q30/02 , G06Q10/04 , G06N3/04

Abstract: Systems and methods for providing a predictive framework are provided. The predictive framework comprises plural neural layers of adaptable, executable neurons. Neurons accept one or more input signals and produce an output signal that may be used by an upper-level neural layer. Input signals are received by an encoding neural layer, where there is a 1:1 correspondence between an input signal and an encoding neuron. Input signals for a set of data are received at the encoding layer and processed successively by the plurality of neural layers. An objective function utilizes the output signals of the topmost neural layer to generate predictive results for the data set according to an objective. In one embodiment, the objective is to determine the likelihood of user interaction with regard to a specific item of content in a set of search results, or the likelihood of user interaction with regard to any item of content in a set of search results.

10.

发明授权
Multi-talker speech recognizer 有权

公开(公告)号：US10460727B2

公开(公告)日：2019-10-29

申请号：US15602366

申请日：2017-05-23

Applicant: Microsoft Technology Licensing, LLC

Inventor： James Droppo , Xuedong Huang , Dong Yu

IPC: G10L15/20 , G10L21/0308 , G10L15/06 , H04R3/00 , G10L25/51 , G10L25/30 , G10L17/04 , G10L17/18 , G10L15/32

Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification