摘要:
A method is provided of generating an artificial language for use, for example, in human speech interfaces to devices. In a preferred implementation, the language generation method involves using a genetic algorithm to evolve a population of individuals over a plurality of generations, the individuals forming or being used to form candidate artificial-language words. The method is carried in a manner favouring the production of artificial-language words which are more easily correctly recognised by a speech recognition system and have a familiarity to a human user. This is achieved, for example, by selecting words for evolution on the basis of an evaluation carried out using a fitness function that takes account both of correct recognition of candidate words when spoken to a speech recognition system, and the similarity of candidate words to words in a set of user-favourite words.
摘要:
The present invention provides a speech feature extraction system suitable for use in a speech recognition system or other voice processing system that extracts features related to the frequency and amplitude characteristics of an input speech signal using a plurality or complex band pass filters and processing the outputs of adjacent bandpass filters.
摘要:
An interactive speech-activated information retrieval application for use in automated telephone systems includes a control manager that interfaces between the caller's speech input and applications and enables several applications to be open at the same time. The control manager continually monitors for control words, enabling the user to switch between applications at will. When a user switches to another application, the control manager suspends the first application and stores its context, enabling the user to later return to the application at the point where the application was previously suspended.
摘要:
A conversation manager processes spoken utterances from a user of a computer, and develops responses to the spoken utterances. The conversation manager includes a reasoning facility and a language generation module. Each response has a domain model associated with it. The domain model includes an ontology (i.e., world view for the relevant domain of the spoken utterances and responses), lexicon, and syntax definitions. The language generation module receives a response in the form of a formal belief structure from other components of the conversation manager. The reasoning facility selects a syntax template to use in generating a response output from the formal belief structure. The language generation module produces the response output based on the formal structure, the selected syntax template, and the domain model.
摘要:
A speech recognition system uses a phoneme counter to determine the length of a word to be recognized. The result is used to split a lexicon into one or more sub-lexicons containing only words which have the same or similar length to that of the word to be recognized, so restricting the search space significantly. In another aspect, a phoneme counter is used to estimate the number of phonemes in a word so that a transition bias can be calculated. This bias is applied to the transition probabilities between phoneme models in an HNN based recognizer to improve recognition performance for relatively short or long words.
摘要:
A system and method for servicing natural language requests with a plurality of remote host systems. The system utilizes a computer program that comprises: (1) an input system for inputting an NL command; (2) a translation system that extracts a request from the NL command and stores the request in a host-independent format; and (3) a routing system for servicing the request, wherein the routing system comprises a mechanism for selecting a host, for converting the request into a host dependent directive, and for forwarding the directive to the selected host. The system may further include a voice recognition system, a local data source for servicing the NL command, templates for converting the request into the host dependent directive, a heuristic for selecting the host, and an output system for obtaining and outputting the response. The invention further comprises a context mechanism to interpret natural language instructions, wherein the context mechanism comprises: (1) a context database for storing sets of command elements and sets of response elements; (2) a context requirement mechanism that determines if a current NL command comprised of a current set of command elements is ambiguous; (3) a context retrieving mechanism that retrieves a previous set of response and/or command elements from the context database; and (4) a disambiguation mechanism that uses the retrieved set of response and/or command elements to disambiguate the current set of command elements.
摘要:
The present invention provides a method to automate the validation of dynamic data presented over telecommunications paths. The invention utilizes continuous speaker-independent speech recognition together with a process known generally as natural language recognition to reduce dynamic utterances to machine encoded text without requiring a prior training phase. Further, when configured by the end user to do so, the test system will convert common examples of dynamic speech, such as numbers, dates, times, and currency utterances into their usual textual representation. This eliminates the limitation that all tested utterances need to be known by the test system in advance of the test. By converting the dynamic utterances to machine encoded text, the invention facilitates automated validation of the data so converted, by allowing its use as input into an automated system which can independently access an validate the data.
摘要:
A voice recognition device is provided to improve a recognition rate for objective recognition terms on display. The device includes a voice pickup unit 5 for picking user's voices up, a storing unit for storing a plurality of objective recognition terms, a display unit la for displaying a designated number of objective recognition terms in the objective recognition terms stored in the storing unit and a voice recognition unit 2. The voice recognition unit 2 has a weighting section for weighting the objective recognition terms on display larger than the other objective recognition terms which are not on display, and a calculating section for calculating respective degrees of agreement between the objective recognition terms after weighting and the user's voices picked up by the unit 5. Based on this calculating result of the degrees of agreement, the voice recognition device does recognize the user's voices inputted.
摘要:
The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.
摘要:
A characteristic of one or more human resonating cavities may be utilized to provide information for speech recognition, independent from the actual sounds produced. In one embodiment, information about the changing shape of the human oral cavity may provide information useful in determining the nature of a person's vocalizations for speech recognition purposes.