摘要:
A computer implemented method in a language independent system generates audio-driven facial animation given the speech recognition system for just one language. The method is based on the recognition that once alignment is generated, the mapping and the animation hardly have any language dependency in them. Translingual visual speech synthesis can be achieved if the first step of alignment generation can be made speech independent. Given a speech recognition system for a base language, the method synthesizes video with speech of any novel language as the input.
摘要:
The present invention provides a hybrid call handling method and system. The method comprises navigating a plurality of received calls from a plurality of callers. The method further comprises monitoring a call health status for each of the plurality of the calls being navigated for entire call duration and notifying a bad call health status of the monitored call to a human agent for employing at least one rectification action. The call health status is determined by monitoring and measuring one or more call parameters. The invention provides for a system for call handling and navigation by an automated system with a human agent assisting the automated system for rectification of calls with bad call health status. Once the call with a bad health is transferred to the human agent, he assists the automated system either by directly communicating with the caller or by communicating using a machine interface.
摘要:
Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.
摘要:
A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.
摘要:
A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
摘要:
A method, system, and computer program product for spoken language grammar evaluation are provided. The method includes playing a recorded question to a candidate, recording a spoken answer from the candidate, and converting the spoken answer into text. The method further includes comparing the text to a grammar database, calculating a spoken language grammar evaluation score based on the comparison, and outputting the spoken language grammar evaluation score.
摘要:
A method, system, and computer program product for spoken language grammar evaluation are provided. The method includes playing a recorded question to a candidate, recording a spoken answer from the candidate, and converting the spoken answer into text. The method further includes comparing the text to a grammar database, calculating a spoken language grammar evaluation score based on the comparison, and outputting the spoken language grammar evaluation score.
摘要:
A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
摘要:
A method, a computer system and a computer program product for generating baseforms or phonetic spellings from input text are disclosed. The baseforms are initially generated using rules defined for a particular language. Then, phones are identified in the language that are exceptions to the defined rules and an action is associated with each identified phone. A statistical technique is applied to determine whether the identified phones can be modified. Finally, baseforms containing the identified phones that can be modified, are corrected according to the associated actions. Preferably, the statistical technique is only applied to baseforms containing phones that are exceptions to the defined rules. The defined rules can comprise spelling-to-sound rules for a particular phonetic language that incorporate all possible alternative pronunciations of each baseform.
摘要:
A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.