-
公开(公告)号:US11170785B2
公开(公告)日:2021-11-09
申请号:US16289403
申请日:2019-02-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Dong Yu
IPC: G10L17/04 , G06K9/62 , G10L21/0272 , G10L17/18 , G10L19/022 , G10L21/0208
Abstract: The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.
-
公开(公告)号:US10249305B2
公开(公告)日:2019-04-02
申请号:US15226527
申请日:2016-08-02
Applicant: Microsoft Technology Licensing, LLC
Inventor: Dong Yu
IPC: G10L21/0272 , G06K9/62 , G10L17/04 , G10L17/18 , G10L19/022 , G10L21/0208
Abstract: The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.
-
公开(公告)号:US20180254040A1
公开(公告)日:2018-09-06
申请号:US15602366
申请日:2017-05-23
Applicant: Microsoft Technology Licensing, LLC
Inventor: James Droppo , Xuedong Huang , Dong Yu
IPC: G10L15/20 , G10L21/0308 , G10L15/32 , G10L15/06 , H04R3/00 , G10L15/187
CPC classification number: G10L15/20 , G10L15/063 , G10L15/32 , G10L17/04 , G10L17/18 , G10L21/0308 , G10L25/30 , G10L25/51 , H04R3/005
Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.
-
公开(公告)号:US20180121814A1
公开(公告)日:2018-05-03
申请号:US15453342
申请日:2017-03-08
Applicant: Microsoft Technology Licensing, LLC
Abstract: Improvements in speed and reductions in computational resource expenditure are realized in the improved tuning of hyperparameters for machine learning processes. To ensure that the values selected for hyperparameters are tuned appropriately, but quickly, several rounds of optimization are performed, each with as many or more iterations of cross-validation than prior rounds; cutting short the analysis unpromising results to devote more time and resources in analyzing promising value sets. The results are used to build suggested sets of hyperparameter values for that round, which are also cross-validated and enable the tuning process to incorporate previous operations to improve its value sets. The most promising sets of hyperparameter values from each round are selected as the basis set for the next round until a final set of values for the hyperparameters is developed.
-
公开(公告)号:US09558742B2
公开(公告)日:2017-01-31
申请号:US15176381
申请日:2016-06-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Dong Yu , Chao Weng , Michael L. Seltzer , James Droppo
CPC classification number: G10L15/16 , G10L15/063 , G10L15/20 , G10L17/18 , G10L25/21 , G10L25/84 , G10L25/90
Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
Abstract translation: 所要求保护的主题包括用于从源识别混合语音的系统和方法。 该方法包括训练第一神经网络以从混合语音样本中识别具有更高级别的语音特征的说话者所说出的语音信号。 该方法还包括训练第二神经网络以从混合语音样本中以较低级别的语音特征来识别讲话者所说出的语音信号。 另外,该方法包括利用第一神经网络和第二神经网络对混合语音样本进行解码,通过考虑特定帧是语音特征的切换点的概率来优化观察两个语音信号的联合似然性。
-
公开(公告)号:US20160284348A1
公开(公告)日:2016-09-29
申请号:US15176381
申请日:2016-06-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Dong Yu , Chao Weng , Michael L. Seltzer , James Droppo
CPC classification number: G10L15/16 , G10L15/063 , G10L15/20 , G10L17/18 , G10L25/21 , G10L25/84 , G10L25/90
Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
-
公开(公告)号:US11049006B2
公开(公告)日:2021-06-29
申请号:US15510356
申请日:2014-09-12
Applicant: Microsoft Technology Licensing, LLC
Inventor: John Langford , Gang Li , Frank Torsten Bernd Seide , James Droppo , Dong Yu
Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.
-
公开(公告)号:US09779727B2
公开(公告)日:2017-10-03
申请号:US15395640
申请日:2016-12-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Dong Yu , Chao Weng , Michael L. Seltzer , James Droppo
CPC classification number: G10L15/16 , G10L15/063 , G10L15/20 , G10L17/18 , G10L25/21 , G10L25/84 , G10L25/90
Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
-
公开(公告)号:US10685281B2
公开(公告)日:2020-06-16
申请号:US15226196
申请日:2016-08-02
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ying Shan , Thomas Ryan Hoens , Jian Jiao , Haijing Wang , Dong Yu , JC Mao
IPC: G06N3/08 , G06F16/951 , G06Q30/02 , G06Q10/04 , G06N3/04
Abstract: Systems and methods for providing a predictive framework are provided. The predictive framework comprises plural neural layers of adaptable, executable neurons. Neurons accept one or more input signals and produce an output signal that may be used by an upper-level neural layer. Input signals are received by an encoding neural layer, where there is a 1:1 correspondence between an input signal and an encoding neuron. Input signals for a set of data are received at the encoding layer and processed successively by the plurality of neural layers. An objective function utilizes the output signals of the topmost neural layer to generate predictive results for the data set according to an objective. In one embodiment, the objective is to determine the likelihood of user interaction with regard to a specific item of content in a set of search results, or the likelihood of user interaction with regard to any item of content in a set of search results.
-
公开(公告)号:US10460727B2
公开(公告)日:2019-10-29
申请号:US15602366
申请日:2017-05-23
Applicant: Microsoft Technology Licensing, LLC
Inventor: James Droppo , Xuedong Huang , Dong Yu
IPC: G10L15/20 , G10L21/0308 , G10L15/06 , H04R3/00 , G10L25/51 , G10L25/30 , G10L17/04 , G10L17/18 , G10L15/32
Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.
-
-
-
-
-
-
-
-
-