摘要:
Systems and methods for responding to spoken language input or multi-modal input are described herein. More specifically, one or more user intents are determined or inferred from the spoken language input or multi-modal input to determine one or more user goals via a dialogue belief tracking system. The systems and methods disclosed herein utilize the dialogue belief tracking system to perform actions based on the determined one or more user goals and allow a device to engage in human like conversation with a user over multiple turns of a conversation. Preventing the user from having to explicitly state each intent and desired goal while still receiving the desired goal from the device, improves a user's ability to accomplish tasks, perform commands, and get desired products and/or services. Additionally, the improved response to spoken language inputs from a user improves user interactions with the device.
摘要:
A method or associated system for motion adaptive speech processing includes dynamically estimating a motion profile that is representative of a user's motion based on data from one or more resources, such as sensors and non-speech resources, associated with the user. The method includes effecting processing of a speech signal received from the user, for example, while the user is in motion, the processing taking into account the estimated motion profile to produce an interpretation of the speech signal. Dynamically estimating the motion profile can include computing a motion weight vector using the data from the one or more resources associated with the user, and can further include interpolating a plurality of models using the motion weight vector to generate a motion adaptive model. The motion adaptive model can be used to enhance voice destination entry for the user and re-used for other users who do not provide motion profiles.
摘要:
A voice to text model used by a voice-enabled electronic device is dynamically and in a context-sensitive manner updated to facilitate recognition of entities that potentially may be spoken by a user in a voice input directed to the voice-enabled electronic device. The dynamic update to the voice to text model may be performed, for example, based upon processing of a first portion of a voice input, e.g., based upon detection of a particular type of voice action, and may be targeted to facilitate the recognition of entities that may occur in a later portion of the same voice input, e.g., entities that are particularly relevant to one or more parameters associated with a detected type of voice action.
摘要:
A speech recognition system used for hands-free data entry receives and analyzes speech input to recognize and accept a user's response. Under certain conditions, a user's response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve performance. For example, if the hypothesis of a user's response matches the expected response then there is a high probability that the user's response was recognized correctly. This information may be used to make adjustments. An expected response may include expected response parts, each part containing expected words. By considering an expected response as the concatenation of expected response parts, each part may be considered independently for the purposes of adjusting an acceptance algorithm, adjusting a model, or recording an apparent error. In this way, the speech recognition system may make modifications based on a wide range of user responses.
摘要:
A speech recognition system used in a workflow receives and analyzes speech input to recognize and accept a user's response to a task. Under certain conditions, a user's response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve recognition accuracy. For example, if the hypothesis of a user's response matches the expected response then there is a high probability that the user's response was recognized correctly. An expected response may include expected words and wildcard words. Wildcard words represent any recognized word in a user's response. By including wildcard words in the expected response, the speech recognition system may make modifications based on a wide range of user responses.
摘要:
An electronic device, a method, and a chip set are provided. The electronic device includes a memory configured to store at least one of audio feature data of audio data and speech recognition data obtained by speech recognition of audio data; and a control module connected to the memory, wherein the control module is configured to update a voice command that is set to execute a function through voice, the function being selected based on at least one of the audio feature data, the speech recognition data, and function execution data executed in relation to the audio data.
摘要:
Examples of methods and systems for providing speech recognition systems based on speech recordings logs are described. In some examples, a method may be performed by a computing device within a system to generate modified data logs to use as a training data set for an acoustic model for a particular language. A device may receive one or more data logs that comprise at least one or more recordings of spoken queries and transcribe the recordings. Based on comparisons, the device may identify any transcriptions that may be indicative of noise and may remove those transcriptions indicative of noise from the data logs. Further, the device may remove unwanted transcriptions from the data logs and the device may provide the modified data logs as a training data set to one or more acoustic models for particular languages.
摘要:
A computer-implemented method, comprising: receiving, by a computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device; determining an estimate for the unknown stage of the multi-stage voice dialog; providing, to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the estimate for the unknown stage of the multi-stage voice dialog; obtaining, by the computing device and from the voice dialog system, a transcription of the voice input, wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a particular prediction for the unknown stage of the multi-stage voice dialog, wherein the voice dialog system is configured to determine the particular prediction for the unknown stage of the multi-stage voice dialog based on (i) the estimate for the unknown stage of the multi-stage voice dialog and (ii) additional information that indicates a context of the voice input; and presenting the transcription of the voice input with the computing device.