摘要:
System and method for speech recognition incorporating environmental variables are provided. The system comprises: a speech capture device (202); a feature extraction module (204); an environment variable module (206), wherein the environment variable module determines a value for an environment variable; and a speech recognition decoder (208), wherein the speech recognition decoder utilizes a deep neural network (DNN) to recognize speech captured by the speech capture device, wherein one or more components of the DNN are modeled as a set of functions of the environment variable.
摘要:
The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
摘要:
Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis output by the speech recognition network, of a second batch of acoustic feature frames of the audio signal, the number of acoustic feature frames of the second batch equal to a second batch size, and input of the second batch to the speech recognition network.