摘要:
The present invention disclose modular speech-to-speech translation systems and methods that provide adaptable platforms to enable verbal communication between speakers of different languages within the context of specific domains. The components of the preferred embodiments of the present invention includes: (1) speech recognition; (2) machine translation; (3) N-best merging module; (4) verification; and (5) text-to-speech. Characteristics of the speech recognition module here are that the modules are structured to provide N-best selections and multi-stream processing, where multiple speech recognition engines may be active at any one time. The N-best lists from the one or more speech recognition engines may be handled either separately or collectively to improve both recognition and translation results. A merge module is responsible for integrating the N-best outputs of the translation engines along with confidence/translation scores to create a ranked list or recognition-translation pairs.
摘要:
The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
摘要:
The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
摘要:
Interpretation from a first language to a second language via one or more communication devices is performed through a communication network (e.g. phone network or the internet) using a server for performing recognition and interpretation tasks, comprising the steps of: receiving an input speech utterance in a first language on a first mobile communication device; conditioning said input speech utterance; first transmitting said conditioned input speech utterance to a server; recognizing said first transmitted speech utterance to generate one or more recognition results; interpreting said recognition results to generate one or more interpretation results in an interlingua; mapping the interlingua to a second language in a first selected format; second transmitting said interpretation results in the first selected format to a second mobile communication device; and presenting said interpretation results in a second selected format on said second communication device.
摘要:
The present invention discloses a speech-to-speech translation device which allows one or more users to input a spoken utterance in one language, translates the utterance into one or more second languages, and outputs the translation in speech form. Additionally, the device allows for translation both directions, recognizing inputs in the one or more second languages and translating them back into the first language. The device recognizes and translates utterances in a limited domain as in a phrase book translation system, so the translation accuracy is essentially 100%. By limiting the domain the system increases the accuracy of the speech recognition component and thus the accuracy of the overall system. However unlike other phrase book systems, the device also allows wide variations and paraphrasing in the input, so that the user is much more likely to find the desired phrase from the stored list of phrases. The device paraphrases the input to a basic canonical form and performs the translation on that canonical form, ignoring the non-essential variations in the surface form of the input. The device can provide visual and/or auditory feedback to confirm the recognized input and makes the system usable for non-bilingual users with absolute confidence.
摘要:
The present invention adopts the fundamental architecture of a statistical machine translation system which utilizes statistical models learned from the training data and does not require expert knowledge for rule-based machine translation systems. Out of the training parallel data, a certain amount of sentence pairs are selected for manual alignment. These sentences are aligned at the phrase level instead of at the word level. Depending on the size of the training data, the optimal amount for manual alignment may vary. The alignment is done using an alignment tool with a graphical user interface which is convenient and intuitive to the users. Manually aligned data are then utilized to improve the automatic word alignment component. Model combination methods are also introduced to improve the accuracy and the coverage of statistical models for the task of statistical machine translation.
摘要:
The present invention adopts the fundamental architecture of a statistical machine translation system which utilizes statistical models learned from the training data and does not require expert knowledge for rule-based machine translation systems. Out of the training parallel data, a certain amount of sentence pairs are selected for manual alignment. These sentences are aligned at the phrase level instead of at the word level. Depending on the size of the training data, the optimal amount for manual alignment may vary. The alignment is done using an alignment tool with a graphical user interface which is convenient and intuitive to the users. Manually aligned data are then utilized to improve the automatic word alignment component. Model combination methods are also introduced to improve the accuracy and the coverage of statistical models for the task of statistical machine translation.
摘要:
Interpretation from a first language to a second language via one or more communication devices is performed through a communication network (e.g. phone network or the internet) using a server for performing recognition and interpretation tasks, comprising the steps of: receiving an input speech utterance in a first language on a first mobile communication device; conditioning said input speech utterance; first transmitting said conditioned input speech utterance to a server; recognizing said first transmitted speech utterance to generate one or more recognition results; interpreting said recognition results to generate one or more interpretation results in an interlingua; mapping the interlingua to a second language in a first selected format; second transmitting said interpretation results in the first selected format to a second mobile communication device; and presenting said interpretation results in a second selected format on said second communication device.
摘要:
The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). The invention enables phrase-based modeling of generic structures of verbal interaction to be used for the purpose of automating part of the design of such grammar networks. Most particularly, the invention enables such grammar networks to be used in providing a voice-controlled user interface to human readable text data that is also machine-readable (such as a Web page, a word processing document, a PDF document, or a spreadsheet).
摘要:
The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks.