Abstract:
An interactive response system mixes HSR subsystems with ASR subsystems to facilitate overall capability of user interfaces. The system permits imperfect ASR subsystems to nonetheless relieve burden on HSR subsystems. An ASR proxy is used to implement an IVR system, and the proxy dynamically selects one or more recognizers from a language model and a human agent to recognize user input. Selection of the one or more recognizers is based on factors such as confidence thresholds of the ASRs and availability of human resources for HSRs.
Abstract:
A system and method for constructing training dictionaries with multichannel information. An exemplary method takes into account the effect of the acoustic path while training multichannel acoustic data. A method that uses different time-frequency resolutions in machine learning training is also presented.
Abstract:
Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.
Abstract:
Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.
Abstract:
Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.
Abstract:
This patent disclosure relates to a voice technology and discloses a voice recognition method and electronic device. In some embodiments of this disclosure, soft clustering calculation is performed in advance according to N gausses obtained by model training, to obtain M soft clustering gausses; when voice recognition is performed, voice is converted to obtain an eigenvector, and top L soft clustering gausses with highest scores are calculated according to the eigenvector, wherein the L is less than the M; and member gausses among the L soft clustering gausses are used as gausses that need to participate in calculation in an acoustic model in a voice recognition process to calculate likelihood of the acoustic model.
Abstract:
Systems and processes for natural language processing are provided. In accordance with one example, a method includes, at a first electronic device with one or more processors and memory, receiving a plurality of words, mapping each of the plurality of words to a word representation, and associating the mapped words to provide a plurality of phrases. In some examples, each of the plurality of phrases has a representation of a first type. The method further includes encoding each of the plurality of phrases to provide a respective plurality of encoded phrases. In some examples, each of the plurality of encoded phrases has a representation of a second type different than the first type. The method further includes determining a value of each of the plurality of encoded phrases and identifying one or more phrases of the plurality of encoded phrases having a value exceeding a threshold.
Abstract:
There are provided a method and a system for automatically generating a back-channel in an interactive agent system. According to an embodiment of the disclosure, an automatic back-channel generation method includes: predicting a back-channel by analyzing an utterance of a user inputted in a back-channel prediction model; and generating the predicted back-channel, and the back-channel prediction model is an AI model that is trained to predict a back-channel to express from the utterance of the user. Accordingly, a back-channel is automatically generated by utilizing a back-channel prediction module which is based on a language model, so that a natural dialogue interaction with a user may be implemented in an interactive agent system, and quality of a dialogue service provided to a user may be enhanced.
Abstract:
There are provided a method and a system for automatically generating a back-channel in an interactive agent system. According to an embodiment of the disclosure, an automatic back-channel generation method includes: predicting a back-channel by analyzing an utterance of a user inputted in a back-channel prediction model; and generating the predicted back-channel, and the back-channel prediction model is an AI model that is trained to predict a back-channel to express from the utterance of the user. Accordingly, a back-channel is automatically generated by utilizing a back-channel prediction module which is based on a language model, so that a natural dialogue interaction with a user may be implemented in an interactive agent system, and quality of a dialogue service provided to a user may be enhanced.
Abstract:
Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.