摘要:
The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.
摘要:
A method and system use an alternative sensor signal received from a sensor other than an air conduction microphone to estimate a clean speech value. The estimation uses either the alternative sensor signal alone, or in conjunction with the air conduction microphone signal. The clean speech value is estimated without using a model trained from noisy training data collected from an air conduction microphone. Under one embodiment, correction vectors are added to a vector formed from the alternative sensor signal in order to form a filter, which is applied to the air conductive microphone signal to produce the clean speech estimate. In other embodiments, the pitch of a speech signal is determined from the alternative sensor signal and is used to decompose an air conduction microphone signal. The decomposed signal is then used to determine a clean signal estimate.
摘要:
The present invention pertains to a concatenative speech synthesis system and method which produces a more natural sounding speech. The system provides for multiple instances of each acoustic unit which can be used to generate a speech waveform representing an linguistic expression. The multiple instances are formed during an analysis or training phase of the synthesis process and are limited to a robust representation of the highest probability instances. The provision of multiple instances enables the synthesizer to select the instance which closely resembles the desired instance thereby eliminating the need to alter the stored instance to match the desired instance. This in essence minimizes the spectral distortion between the boundaries of adjacent instances thereby producing more natural sounding speech.
摘要:
A text-to-speech system includes a storage device for storing a clustered set of context-dependent phoneme-based units of a target speaker. In one embodiment, decision trees are used wherein each decision tree based context-dependent phoneme-based unit is arranged based on context of at least one immediately preceding and succeeding phoneme. At least one of the context-dependent phoneme-based units represents other non-stored context-dependent phoneme units of similar sound due to similar contexts. A text analyzer obtains a string of phonetic symbols representative of text to be converted to speech. A concatenation module selects stored decision tree based context-dependent phoneme-based units from the set decision tree based context-dependent phoneme-based units based on the context of the phonetic symbols and synthesizes the selected phoneme-based units to generate speech corresponding to the text.
摘要:
Content management architecture for a portable wireless device. Caching and fetching techniques are provided to improve content handling for portable devices such as cellular telephones and portable computers. A search component automatically performs searches as a background process, and potentially desired content is received and cached by a content storing component to be available in the future when and if needed, mitigating latency associated with slow download speeds, refresh rates, and other system and/or network impediments. Content from background search results can be trickled into the device as part of the background process so as not to burden system resources for other processes. As part of memory management, aged and/or low priority or low interest content can be selectively removed or archived to increase available cache or memory space, as well as to maintain relevant content within the device. A presentation component facilitates presentation of the pre-stored content.
摘要:
The claimed subject matter provides a system and/or a method that facilitates replicating a telepresence session with a real world physical meeting. A telepresence session can be initiated within a communication framework that includes two or more virtually represented users that communicate therein. A trigger component can monitor the telepresence session in real time to identify a participant interaction with an object, wherein the object is at least one of a real world physical object or a virtually represented object within the telepresence session. A feedback component can implement a force feedback to at least one participant within the telepresence session based upon the identified participant interaction with the object, wherein the force feedback is employed via a device associated with at least one participant.
摘要:
A system that employs an explicitly and/or implicitly trained model in order to return entity-specific computer-based search results is provided. The innovation can provide for a customized search model that focuses search in connection with achieving information that is meaningful with respect to goals of an entity. The model can be used to modify a search query in accordance with a goal of the entity or to generate the search query thereby returning meaningful and/or targeted results to the user. The system can automatically gather entity-related data thereafter determining or inferring a goal as well as training the model. Moreover, the system can selectively configure (e.g., order, rank, filter) and render results to a user based upon the model.
摘要:
A language processing system includes a unified language model. The unified language model comprises a plurality of context-free grammars having non-terminal tokens representing semantic or syntactic concepts and terminals, and an N-gram language model having non-terminal tokens. A language processing module capable of receiving an input signal indicative of language accesses the unified language model to recognize the language. The language processing module generates hypotheses for the received language as a function of words of the unified language model and/or provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.
摘要:
A language model is used in a speech recognition system which has access to a first, smaller data store and a second, larger data store. The language model is adapted by formulating an information retrieval query based on information contained in the first data store and querying the second data store. Information retrieved from the second data store is used in adapting the language model. Also, language models are used in retrieving information from the second data store. Language models are built based on information in the first data store, and based on information in the second data store. The perplexity of a document in the second data store is determined, given the first language model, and given the second language model. Relevancy of the document is determined based upon the first and second perplexities. Documents are retrieved which have a relevancy measure that exceeds a threshold level.
摘要:
A speech recognition system is extensible in that new terms may be added to a list of terms that are recognized by the speech recognition system. The speech recognition system provides audio feedback when new terms are added so that a user may hear how the system expects the word to be pronounced. The user may then accept the pronunciation or provide his own pronunciation. The user may also selectively change the pronunciation of words to avoid misrecognitions by the system. The system may provide appropriate user interface elements for enabling a user to change the pronunciation of words. The system may also include intelligence for automatically changing the pronunciation of words used in recognition based upon empirically derived information.