摘要:
Speech recognition techniques are disclosed herein. In one embodiment, a novice mode is available such that when the user is unfamiliar with the speech recognition system, a voice user interface (VUI) may be provided to guide them. The VUI may display one or more speech commands that are presently available. The VUI may also provide feedback to train the user. After the user becomes more familiar with speech recognition, the user may enter speech commands without the aid of the novice mode. In this “experienced mode,” the VUI need not be displayed. Therefore, the user interface is not cluttered.
摘要:
Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
摘要:
A self-service checkout terminal includes a base having a bagwell defined therein. The terminal also includes a first counter supported on the base. The first counter has a first surface which is positioned at a first height. The terminal further includes a scanner secured at a first end of the first counter. The terminal yet further includes an automated teller machine secured at a second end of the first counter. Moreover, the terminal includes an arcuate shaped second counter secured to the first counter, the second counter having a second surface which is positioned at a second height. The first counter has a bagwell opening defined therein at a location interposed between the scanner and the automated teller machine. The bagwell opening is aligned with the bagwell. The first height is less than the second height.
摘要:
The subject disclosure is directed towards detecting symbolic activity within a given environment using a context-dependent grammar. In response to receiving sets of input data corresponding to one or more input modalities, a context-aware interactive system processes a model associated with interpreting the symbolic activity using context data for the given environment. Based on the model, related sets of input data are determined. The context-aware interactive system uses the input data to interpret user intent with respect to the input and thereby, identify one or more commands for a target output mechanism.
摘要:
Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
摘要:
The subject disclosure is directed towards detecting symbolic activity within a given environment using a context-dependent grammar. In response to receiving sets of input data corresponding to one or more input modalities, a context-aware interactive system processes a model associated with interpreting the symbolic activity using context data for the given environment. Based on the model, related sets of input data are determined. The context-aware interactive system uses the input data to interpret user intent with respect to the input and thereby, identify one or more commands for a target output mechanism.
摘要:
A method of monitoring item shuffling in a post-scan area of a self-service checkout terminal having a post-scan shelf, a bagwell with a grocery container positioned therein, and a weight scale positioned so as to detect weight of items positioned both on the post-scan shelf and in the grocery container, includes the step of detecting removal of a first number of items from the post-scan shelf with the weight scale and generating a first weight decrease value in response thereto which corresponds to the weight of the first number of items. The method also includes the step of detecting placement of a second number of items into the grocery container with the weight scale and generating a first weight increase value in response thereto which corresponds to the weight of the second number of items. The method further includes the step of comparing the first weight decrease value to the first weight increase value and generating a first match control signal in response thereto if the first weight decrease value matches the first weight increase value.
摘要:
A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands.