Permutation invariant training for talker-independent multi-talker speech separation

    公开(公告)号:US11170785B2

    公开(公告)日:2021-11-09

    申请号:US16289403

    申请日:2019-02-28

    Inventor: Dong Yu

    Abstract: The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.

    Permutation invariant training for talker-independent multi-talker speech separation

    公开(公告)号:US10249305B2

    公开(公告)日:2019-04-02

    申请号:US15226527

    申请日:2016-08-02

    Inventor: Dong Yu

    Abstract: The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.

    HYPERPARAMETER TUNING
    4.
    发明申请

    公开(公告)号:US20180121814A1

    公开(公告)日:2018-05-03

    申请号:US15453342

    申请日:2017-03-08

    Inventor: Dong Yu Chi Jin

    CPC classification number: G06N7/005 G06N3/04

    Abstract: Improvements in speed and reductions in computational resource expenditure are realized in the improved tuning of hyperparameters for machine learning processes. To ensure that the values selected for hyperparameters are tuned appropriately, but quickly, several rounds of optimization are performed, each with as many or more iterations of cross-validation than prior rounds; cutting short the analysis unpromising results to devote more time and resources in analyzing promising value sets. The results are used to build suggested sets of hyperparameter values for that round, which are also cross-validated and enable the tuning process to incorporate previous operations to improve its value sets. The most promising sets of hyperparameter values from each round are selected as the basis set for the next round until a final set of values for the hyperparameters is developed.

    Mixed speech recognition
    5.
    发明授权
    Mixed speech recognition 有权
    混合语音识别

    公开(公告)号:US09558742B2

    公开(公告)日:2017-01-31

    申请号:US15176381

    申请日:2016-06-08

    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

    Abstract translation: 所要求保护的主题包括用于从源识别混合语音的系统和方法。 该方法包括训练第一神经网络以从混合语音样本中识别具有更高级别的语音特征的说话者所说出的语音信号。 该方法还包括训练第二神经网络以从混合语音样本中以较低级别的语音特征来识别讲话者所说出的语音信号。 另外,该方法包括利用第一神经网络和第二神经网络对混合语音样本进行解码,通过考虑特定帧是语音特征的切换点的概率来优化观察两个语音信号的联合似然性。

    MIXED SPEECH RECOGNITION
    6.
    发明申请

    公开(公告)号:US20160284348A1

    公开(公告)日:2016-09-29

    申请号:US15176381

    申请日:2016-06-08

    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

    Computing system for training neural networks

    公开(公告)号:US11049006B2

    公开(公告)日:2021-06-29

    申请号:US15510356

    申请日:2014-09-12

    Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.

    Automated predictive modeling and framework

    公开(公告)号:US10685281B2

    公开(公告)日:2020-06-16

    申请号:US15226196

    申请日:2016-08-02

    Abstract: Systems and methods for providing a predictive framework are provided. The predictive framework comprises plural neural layers of adaptable, executable neurons. Neurons accept one or more input signals and produce an output signal that may be used by an upper-level neural layer. Input signals are received by an encoding neural layer, where there is a 1:1 correspondence between an input signal and an encoding neuron. Input signals for a set of data are received at the encoding layer and processed successively by the plurality of neural layers. An objective function utilizes the output signals of the topmost neural layer to generate predictive results for the data set according to an objective. In one embodiment, the objective is to determine the likelihood of user interaction with regard to a specific item of content in a set of search results, or the likelihood of user interaction with regard to any item of content in a set of search results.

Patent Agency Ranking