摘要:
A speech coding system, responsive to an input speech signal provided by a system user, comprises: a speech coding portion including a speech recognition system responsive to the input speech signal and having a word vocabulary associated therewith, the speech recognition system recognizing the input speech signal in accordance with the vocabulary and generating phonetic tokens, such as at least one sequence of lefemes, representative of the input speech signal; a channel, responsive to the at least one sequence of lefemes, for transmitting and/or storing the at least one sequence of lefemes; and a speech synthesizing portion, responsive to the transmitted/stored sequence of lefemes, for generating a synthesized speech signal which is representative of the input speech signal provided by the system user using the at least one sequence of lefemes. The speech recognition system preferably generates acoustic parameters from the input speech signal which include voice characteristics of the system user. The speech coding system also preferably comprises a labeler which processes the input speech signal including words uttered by the system user which are not in the word vocabulary associated with the speech recognition system, the labeler generating phonetic tokens, such as at least one sequence of lefemes, optimally representative of the input speech signal. The sequence of lefemes from the labeler and the speech recognition portion are compared, for each speech segment, and the sequence most similar to the input speech is selected for transmission/storage. The speech synthesizing portion of the system preferably performs speech synthesis using pre-enrolled phonetic sub-units or tokens.
摘要:
A method of training at least one new word for addition to a vocabulary of a speech recognition engine containing existing words comprises the steps of: a user uttering the at least one new word; computing respective measures between the at least one newly uttered word and at least a portion of the existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if no measure is within the threshold range, automatically adding the at least one newly uttered word to the vocabulary; and if at least one measure is within a threshold range, refraining from automatically adding the at least one newly uttered word to the vocabulary.
摘要:
A method of determining potential acoustic confusion between at least one new word and at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user inputting the at least one new word; computing respective measures between the at least one new word and the at least a portion of existing vocabulary words, the respective measures indicative of acoustic similarity between the at least one word and the at least a portion of existing words; if at least one measure is within a threshold range, indicating results associated with the at least one measure and prompting the user to input an alternative word or additional information pertaining to the at least one new word; and if no measure is within the threshold range, adding the at least one new word to the vocabulary.
摘要:
In a text-to-speech system, a method of converting text-to-speech can include receiving a text input and comparing the received text input to at least one entry in a text-to-speech cache memory. Each entry in the text-to-speech cache memory can specify a corresponding spoken output. If the text input matches one of the entries in the text-to-speech cache memory, the cached speech output specified by the matching entry can be provided.
摘要:
A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user uttering the word; decoding the uttered word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; if at least one measure is within a threshold range, indicating, to the user, results associated with the at least one measure, the results preferably including the decoded word and the other existing vocabulary word associated with the at least one measure; and the user preferably making a selection depending on the word the user intended to utter.
摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
A system and method for signal analysis in a network. The method includes attempting, by a first processor, to compute optimal coefficients for filtering a signal, determining that computing the optimal coefficients exceeds the computational capabilities of the first processor, notifying a second processor that computing the optimal coefficients exceeds the computational capabilities of the first processor, and computing, by the second processor, the optimal coefficients. The system and method account for limited computational resources allocated to certain processors in a telecommunications system.
摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
A Bell Tree data structure is provided to model the process of chaining the mentions, from one or more documents, into entities, tracking the entire process; where the data structure is used in an entity tracking process that produces multiple results ranked by a product of probability scores.
摘要:
An audio splitting system for sharing speech data associated with the same utterance between multiple speech technologies (consumers). In one aspect, the system comprises one or more queues for storing data, a plurality of consumers each sharing the data stored in the one or more queues and a scheduler for managing the storage of the data in the one or more queues and the consumption of the data in the one or more queues by each of the plurality of consumers. The consumers will register their data requirements and priority requests with the scheduler. The scheduler assigns each of the plurality of consumers to one or more of the queues based on the registered data requirements.