摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting an end of a query are disclosed. In one aspect, a method includes the actions of receiving audio. The actions further include applying an end of query model. The actions further include determining the confidence score that reflects a likelihood that the utterance is a complete utterance. The actions further include comparing the confidence score to a threshold. The actions further include determining whether the utterance is likely complete. The actions further include providing a microphone instruction. This approach may increase the utility of the computing device for users, in particular for users with speech disorders or impediments. This approach may also conserve power, since the microphone does not need to be activated. The use of computational resources in interpreting and performing tasks based on additional audio detected by the microphone may also be avoided.
摘要:
Example implementations described herein are directed to a dialog system with self-learning natural language understanding (NLU), involving a client-server configuration. If the NLU results in the client is not confident, the NLU will be done again in the server. In the dialog system, the human user and the system communicate via speech or text information. The examples of such products include robots, interactive voice response system (IVR) for call centers, voice-enabled personal devices, car navigation system, smart phones, and voice input devices in the work environments where the human operator cannot operate the devices by hands.
摘要:
Systems and methods for responding to spoken language input or multi-modal input are described herein. More specifically, one or more user intents are determined or inferred from the spoken language input or multi-modal input to determine one or more user goals via a dialogue belief tracking system. The systems and methods disclosed herein utilize the dialogue belief tracking system to perform actions based on the determined one or more user goals and allow a device to engage in human like conversation with a user over multiple turns of a conversation. Preventing the user from having to explicitly state each intent and desired goal while still receiving the desired goal from the device, improves a user's ability to accomplish tasks, perform commands, and get desired products and/or services. Additionally, the improved response to spoken language inputs from a user improves user interactions with the device.
摘要:
A method (300) and apparatus for determining a motion environment profile to adapt voice recognition processing includes a device receiving (302) an acoustic signal including a speech signal, which is provided to a voice recognition module. The method also includes determining (304) a motion profile for the device, determining (306) a temperature profile for the device, and determining (308) a noise profile for the acoustic signal. The method further includes determining (310), from the motion, temperature, and noise profiles, a motion environment profile for the device and adapting (312) voice recognition processing for the speech signal based on the motion environment profile.