摘要:
This application is related to a means and a method for facilitating the use of translation memories by aligning words of an input source language sentence with the correspondent translated words in target language sentence. More specifically, this invention relates to such a means and method where there is an enhanced translation memory comprising an alignment function.
摘要:
This application is related to a means and a method for facilitating the use of translation memories by aligning words of an input source language sentence with the correspondent translated words in target language sentence. More specifically, this invention relates to such a means and method where there is an enhanced translation memory comprising an alignment function.
摘要:
The invention relates to a method and a means for automatically post-editing a translated text. A source language text is translated into an initial target language text. This initial target language text is then post-edited by an automatic post-editor into an improved target language text. The automatic post-editor is trained on a sentence aligned parallel corpus created from sentence pairs T′ and T, where T′ is an initial training translation of a source training language text, and T is second, independently derived, training translation of a source training language text.
摘要:
The vehicular monitoring system obtains audio information from the speech of occupants within the vehicle and then processes that audio information to extract information about the behavior of the vehicle occupants. Using the behavioral information the system then assesses whether said behavior is in compliance with a predefined set of rules. Severe violations are reported to a third party via cellular telephone or other means. Less severe violations are recorded in a log file that is subsequently uploaded to a networked computer system for review and analysis. The behavioral rules used to assess violations may be modified by the administrative user.
摘要:
The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.
摘要:
A phoneticizer converts spelled words or names into one or an n-best number of phonetic transcriptions. The n-best transcriptions may be generated from a single transcription using a confusion matrix. These n-best transcriptions are then transformed into hybrid units. Preferably only the most frequently encountered units are stored as syllables, with the remainder being stored as smaller units such as demi-syllables or phonemes. Voice input is then used to rescore the n-best transcriptions and these are stored preferably as speaker-independent, similarity-based hybrid units concatenated into a string representing the spelled word.
摘要:
The remote control unit supports multi-modal dialog with the user, through which the user can easily select programs for viewing or recording. The remote control houses a microphone into which the user can input natural language speech. The input speech is recognized and interpreted by a natural language parser that extracts the semantic content of the user's speech. The parser works in conjunction with an electronic program guide, through which the remote control system is able to ascertain what programs are available for viewing or recording and supply appropriate prompts to the user. In one embodiment, the remote control includes a touch screen display upon which the user may view prompts or make selections by pen input or tapping. Selections made on the touch screen automatically limit the context of the ongoing dialog between user and remote control, allowing the user to interact naturally with the unit. The remote control unit can control virtually any audio-video component, including those designed before the current technology. The remote control system can be packaged entirely within the remote control handheld unit, or components may be distributed in other systems attached to the user's multimedia equipment.
摘要:
Speech recognition and natural language parsing components are used to extract the meaning of the user's spoken input. The system stores a semantic representation of an electronic program guide, and the contents of the program guide can be mapped into the grammars used by the natural language parser. Thus, when the user wishes to navigate through the complex menu structure of the electronic program guide, he or she only needs to speak in natural language sentences. The system automatically filters the contents of the program guide and supplies the user with on-screen display or synthesized speech responses to the user's request.
摘要:
Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.
摘要:
A two-stage pronunciation generator utilizes mixed decision trees that includes a network of yes-no questions about letter, syntax, context, and dialect in a spelled word sequence. A second stage utilizes decision trees that includes a network of yes-no questions about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision trees provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.