摘要:
Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations. By way of example only, the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.
摘要:
A technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a class is determined after a first iteration and a sub-class of the class is determined after a second iteration. The class and the sub-class of the class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The user intent extracting step may further determine a sub-class of the sub-class of the class after a third iteration, such that the class, the sub-class of the class, and the sub-class of the sub-class of the class are hierarchically indicative of the intent of the user.
摘要:
Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
摘要:
Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations. By way of example only, the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.
摘要:
The present invention concerns methods and apparatus for identifying and assigning meaning to words not recognized by a vocabulary or grammar of a speech recognition system. In an embodiment of the invention, the word may be in an acoustic vocabulary of the speech recognition system, but may be unrecognized by an embedded grammar of a language model of the speech recognition system. In another embodiment of the invention, the word may not be recognized by any vocabulary associated with the speech recognition system. In embodiments of the invention, at least one hypothesis is generated for an utterance not recognized by the speech recognition system. If the at least one hypothesis meets at least one predetermined criterion, a sword or more corresponding to the at least one hypothesis is added to the vocabulary of the speech recognition system. In other embodiments of the invention, before adding the word to the vocabulary of the speech recognition system, the at least one hypothesis may be presented to the user of the speech recognition system to determine if that is what the used intended when the user spoke.
摘要:
Systems and methods are provided for processing and executing commands in automated systems. For example, command processing systems and methods are provided which can automatically determine, evaluate or otherwise predict consequences of execution of misrecognized or misinterpreted user commands in automated systems and thus prevent undesirable or dangerous consequences that can result from execution of misrecognized/misinterpreted commands.
摘要:
Systems and methods are provided for processing and executing commands in automated systems. For example, command processing systems and methods are provided which can automatically determine, evaluate or otherwise predict consequences of execution of misrecognized or misinterpreted user commands in automated systems and thus prevent undesirable or dangerous consequences that can result from execution of misrecognized/misinterpreted commands.
摘要:
Techniques for managing vehicular emergencies are disclosed. For example, a method of managing a vehicular emergency includes the steps of collecting biometric data regarding at least one occupant of a vehicle, collecting data regarding at least one operational characteristic of the vehicle, and detecting vehicular emergencies through analysis of at least a portion of the biometric data and the operational characteristic data. This method may also include communicating at least one message relating to the data, wherein the content of the message is determined by the processing device based at least in part on the data and/or controlling a function of the vehicle in response to the data. The method may also include collecting data regarding at least one operational characteristic of at least one proximate vehicle.
摘要:
The present invention concerns methods and apparatus for identifying and assigning meaning to words not recognized by a vocabulary or grammar of a speech recognition system. In an embodiment of the invention, the word may be in an acoustic vocabulary of the speech recognition system, but may be unrecognized by an embedded grammar of a language model of the speech recognition system. In another embodiment of the invention, the word may not be recognized by any vocabulary associated with the speech recognition system. In embodiments of the invention, at least one hypothesis is generated for an utterance not recognized by the speech recognition system. If the at least one hypothesis meets at least one predetermined criterion, a sword or more corresponding to the at least one hypothesis is added to the vocabulary of the speech recognition system. In other embodiments of the invention, before adding the word to the vocabulary of the speech recognition system, the at least one hypothesis may be presented to the user of the speech recognition system to determine if that is what the used intended when the user spoke.
摘要:
In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.