摘要:
Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
摘要:
Systems and methods for intelligent control of microphones in speech processing applications, which allows the capturing, recording and preprocessing of speech data in the captured audio in a way that optimizes speech decoding accuracy.
摘要:
A method and system of masking a group of related data values. A record in an unmasked data file of n records is read. The record includes a first set of data values of data elements included in a related data group (RDG) and one or more data values of one or more data elements external to the RDG. A random number k is received. A second set of data values is retrieved from a lookup table that associates n key values with n sets of data values. Retrieving the second set of data values includes identifying that the second set of data values is associated with a key value of k. The n sets of data values are included in the umnasked data file's n records. The record is masked by replacing the first set of data values with the retrieved second set of data values.
摘要:
A machine, system and method for user-guided teaching and modifications of voice commands and actions to be executed by a conversational learning system. The machine includes a system bus for communicating data and control signals received from the conversational learning system to a computer system, a vehicle data and control bus for connecting devices and sensors in the machine, a bridge module for connecting the vehicle data and control bus to the system bus, machine subsystems coupled to the vehicle data and control bus having a respective user interface for receiving a voice command or input signal from a user, a memory coupled to the system bus for storing action command sequences learned for a new voice command and a processing unit coupled to the system bus for automatically executing the action command sequences learned when the new voice command is spoken.
摘要:
Speakers are automatically identified in an audio (or video) source. The audio information is processed to identify potential segment boundaries. Homogeneous segments are clustered substantially concurrently with the segmentation routine, and a cluster identifier is assigned to each identified segment. A segmentation subroutine identifies potential segment boundaries using the BIC model selection criterion. A clustering subroutine uses a BIC model selection criterion to assign a cluster identifier to each of the identified segments. If the difference of BIC values for each model is positive, the two clusters are merged.
摘要:
Illustrative embodiments provide a computer implemented method and apparatus, in the form of a data processing system, and a computer program product for optimizing a natural language translation. In one illustrative embodiment, the computer implemented method comprises receiving a request from a requester, wherein the request comprises source language data, an indication of a source language and a destination language, and determining whether a translation between the source language and the destination language is needed. Identifying a mapping between the source language and the destination language includes a set of hops, the method, responsive to a determination that the translation is needed, translates the source language data into a destination language data associated with each successive hop in the set of hops in the mapping and returns the destination language data to the requester at a destination hop.
摘要:
Systems and methods for intelligent control of microphones in speech processing applications, which allows the capturing, recording and preprocessing of speech data in the captured audio in a way that optimizes speech decoding accuracy.
摘要:
An improved apparatus and method is provided for operating devices and systems in a motor vehicle, while at the same time reducing vehicle operator distractions. One or more touch sensitive pads are mounted on the steering wheel of the motor vehicle, and the vehicle operator touches the pads in a pre-specified synchronized pattern, to perform functions such as controlling operation of the radio or adjusting a window. At least some of the touch patterns used to generate different commands may be selected by the vehicle operator. Usefully, the system of touch pad sensors and the signals generated thereby are integrated with speech recognition and/or facial gesture recognition systems, so that commands may be generated by synchronized multi-mode inputs.
摘要:
In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.
摘要:
A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.