摘要:
A graphic user interface system for use with a content based retrieval system includes an active display having display areas. For example, the display areas include a main area providing an overview of database contents by displaying representative samples of the database contents. The display areas also include one or more query areas into which one or more of the representative samples can be moved from the main area by a user employing gesture based interaction. A query formulation module employs the one or more representative samples moved into the query area to provide feedback to the content based retrieval system.
摘要:
A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems. The method includes: defining a likelihood ratio for a given speech segment, whose speaker identity (for SV system) or linguist identity (for UV system) is known, using a corresponding acoustic model, and an alternative acoustic model which represents all other speakers (in SV) or all other linguist identities (in UV); determining an average likelihood ratio score for the likelihood ratio scores over a set of training utterances (referred to as true data set) whose speaker identities (for SV) or linguist identities (for UV) are the same; determining an average likelihood ratio score for the likelihood ratio scores over a competing set of training utterances which excludes the speech data in the true data set (referred to as competing data set); and optimizing a difference between the average likelihood ratio score over the true data set and the average likelihood ratio score over the competing data set, thereby improving the acoustic model.
摘要:
A computer-implemented method is provided for identifying objects in an image. The method includes: capturing a series of images of a scene using a camera; receiving a topographical map for the scene that defines distances between objects in the scene; determining distances between objects in the scene from a given image; approximating identities of objects in the given image by comparing the distances between objects as determined from the given image in relation to the distances between objects from the map. The identities of objects can be re-estimated using features of the objects extracted from the other images.
摘要:
Accordingly, a virtual keypad system for inputting text is provided. A virtual keypad system includes a remote controller having at least one touchpad incorporated therein and divided into a plurality of touch zones. A display device is in data communication with the remote controller and is operable to display a user interface including a keypad, where each key of the keypad is mapped to a touch zone of the touchpad. A prediction module, in response to an operator pressing a given touch zone to select a particular character, performs one or more key prediction methods to predict one or more next plausible keys. A key mapping module remaps the touch zones of the touchpad to the keys of the keypad based on the one or more next plausible keys.
摘要:
Accordingly, a virtual keypad system for inputting text is provided. A virtual keypad system includes a remote controller having at least one touchpad incorporated therein and divided into a plurality of touch zones. A display device is in data communication with the remote controller and is operable to display a user interface including a keypad, where each key of the keypad is mapped to a touch zone of the touchpad. A prediction module, in response to an operator pressing a given touch zone to select a particular character, performs one or more key prediction methods to predict one or more next plausible keys. A key mapping module remaps the touch zones of the touchpad to the keys of the keypad based on the one or more next plausible keys.
摘要:
A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems. The method includes: defining a likelihood ratio for a given speech segment, whose speaker identity (for SV system) or linguist identity (for UV system) is known, using a corresponding acoustic model, and an alternative acoustic model which represents all other speakers (in SV) or all other linguist identities (in UV); determining an average likelihood ratio score for the likelihood ratio scores over a set of training utterances (referred to as true data set) whose speaker identities (for SV) or linguist identities (for UV) are the same; determining an average likelihood ratio score for the likelihood ratio scores over a competing set of training utterances which excludes the speech data in the true data set (referred to as competing data set); and optimizing a difference between the average likelihood ratio score over the true data set and the average likelihood ratio score over the competing data set, thereby improving the acoustic model.
摘要:
An improved discriminative training method is provided for hidden Markov models. The method includes: defining a measure of separation margin for the data; identifying a subset of training utterances having utterances misrecognized by the models; defining a training criterion for the models based on maximizing the separation margin; formulating the training criterion as a constrained minimax optimization problem; and solving the constrained minimax optimization problem over the subset of training utterances, thereby discriminatively training the models.
摘要:
Accordingly, a virtual keypad system for inputting text is provided. A virtual keypad system includes a remote controller having at least one touchpad incorporated therein and divided into a plurality of touch zones. A display device is in data communication with the remote controller and is operable to display a user interface including a keypad, where each key of the keypad is mapped to a touch zone of the touchpad. A prediction module, in response to an operator pressing a given touch zone to select a particular character, performs one or more key prediction methods to predict one or more next plausible keys. A key mapping module remaps the touch zones of the touchpad to the keys of the keypad based on the one or more next plausible keys.
摘要:
A method and apparatus for data entry by voice under adverse conditions is disclosed. More specifically it provides a way for efficient and robust form filling by voice. A form can typically contain one or several fields that must be filled in. The user communicates to a speech recognition system and word spotting is performed upon the utterance. The spotted words of an utterance form a phrase that can contain field-specific values and/or commands. Recognized values are echoed back to the speaker via a text-to-speech system. Unreliable or unsafe inputs for which the confidence measure is found to be low (e.g. ill-pronounced speech or noises) are rejected by the spotter. Speaker adaptation is furthermore performed transparently to improve speech recognition accuracy. Other input modalities can be additionally supported (e.g. keyboard and touch-screen). The system maintains a dialogue history to enable editing and correction operations on all active fields.
摘要:
A noise robustness method operates jointly in a signal domain and a model domain. For example, energy is added in the signal domain for frequency bands where an actual noise level of an incoming signal is lower than a noise level used to train models, thus obtaining a compensated signal. Also, energy is added in the model domain for frequency bands where noise level of the incoming signal or the compensated signal is higher than the noise level used to train the models. Moreover, energy is never removed, thereby avoiding problems of higher sensitivity of energy removal to estimation errors.