摘要:
A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.
摘要:
A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.
摘要:
In order to achieve low error rates in a speech recognition system, for example, in a system employing rank-based decoding, we discriminate the most confusable incorrect leaves from the correct leaf by lowering their ranks. That is, we increase the likelihood of the correct leaf of a frame, while decreasing the likelihoods of the confusable leaves. In order to do this, we use the auxiliary information from the prediction of the neighboring frames to augment the likelihood computation of the current frame. We then use the residual errors in the predictions of neighboring frames to discriminate between the correct (best) and incorrect leaves of a given frame. We present a new methodology that incorporates prediction error likelihoods into the overall likelihood computation to improve the rank position of the correct leaf.
摘要:
A multi-lingual time-synchronized translation system and method provide automatic time-synchronized spoken translations of spoken phrases. The multi-lingual time-synchronized translation system includes a phrase-spotting mechanism, optionally, a language understanding mechanism, a translation mechanism, a speech output mechanism and an event measuring mechanism. The phrase-spotting mechanism identifies a spoken phrase from a restricted domain of phrases. The language understanding mechanism, if present, maps the identified phrase onto a small set of formal phrases. The translation mechanism maps the formal phrase onto a well-formed phrase in one or more target languages. The speech output mechanism produces high-quality output speech using the output of the event measuring mechanism for time synchronization. The event-measuring mechanism measures the duration of various key events in the source phrase. Event duration could be, for example, the overall duration of the input phrase, the duration of the phrase with interword silences omitted, or some other relevant durational features. The present invention recognizes the quality improvements can be achieved by restricting the task domain under consideration.
摘要:
A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
摘要:
Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
摘要:
In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
摘要:
A method for fast match processing, comprising two stages, a pre-processing stage and an on-line stage. The pre-processing stage comprises the steps of computing an a-priori probability of occurrence for each word from an acoustic vocabulary; deriving a penalty score for each word from said acoustic vocabulary based on each words a-priori probability of occurrence in an input text. The on-line stage operates on an input text stream, comprising the steps of, computing a path score for each word from said input text; combining the computed path score with the derived penalty score to form a combined score and testing the combined score against a threshold to determine top ranking candidate words.
摘要:
Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
摘要:
A technique for producing speech output in an automatic dialog system is provided. Communication is received from a user at the automatic dialog system. A context of the communication from the user is detected in a context detector of the automatic dialog system. A message is provided to the user from a text-to-speech system of the automatic dialog system in communication with the context detector, wherein the message is provided in accordance with the detected context of the communication.