Abstract:
Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
Abstract:
Features are disclosed for processing and interpreting natural language, such as interpretations of user utterances, in multi-turn dialog interactions. Context information regarding interpretations of user utterances and system responses to the user utterances can be maintained. Subsequent user utterances can be interpreted using the context information, rather than being interpreted without context. In some cases, interpretations of subsequent user utterances can be merged with interpretations of prior user utterances using a rule-based framework. Rules may be defined to determine which interpretations may be merged and under what circumstances they may be merged.
Abstract:
A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
Abstract:
Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
Abstract:
Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
Abstract:
Features are disclosed for determining a definition or value of a nonstandard term. A user utterance may be processed into one or more candidate transcriptions. An interpretation of the utterance can be generated from the transcriptions. If the transcription includes a word, phrase, or term that is not recognized or is used in a nonstandard way, one or more data stores may be queried regarding the proper value or definition of the term. If a definition or value is not available in the data stores, the user may be prompted to provide one. The user-supplied definition can be saved for future use, and may be used as a general definition of the term for other users.
Abstract:
Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.
Abstract:
Features are disclosed for processing and interpreting natural language, such as interpretations of user utterances, in multi-turn dialog interactions. Context information regarding interpretations of user utterances and system responses to the user utterances can be maintained. Subsequent user utterances can be interpreted using the context information, rather than being interpreted without context. In some cases, interpretations of subsequent user utterances can be merged with interpretations of prior user utterances using a rule-based framework. Rules may be defined to determine which interpretations may be merged and under what circumstances they may be merged.