-
公开(公告)号:US11908463B1
公开(公告)日:2024-02-20
申请号:US17361761
申请日:2021-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Arjit Biswas , Shishir Bharathi , Anushree Venkatesh , Yun Lei , Ashish Kumar Agrawal , Siddhartha Reddy Jonnalagadda , Prakash Krishnan , Arindam Mandal , Raefer Christopher Gabriel , Abhay Kumar Jha , David Chi-Wai Tang , Savas Parastatidis
IPC: G10L15/22 , G06F40/35 , G10L15/183 , G10L15/18 , G06F40/279 , G06F40/295 , G10L15/19 , G06F40/30
CPC classification number: G10L15/183 , G06F40/279 , G10L15/1815 , G10L15/22 , G06F40/295 , G06F40/30 , G06F40/35 , G10L15/1822 , G10L15/19 , G10L2015/228
Abstract: Techniques for storing and using multi-session context are described. A system may store context data corresponding to a first interaction, where the context data may include action data, entity data and a profile identifier for a user. Later the stored context data may be retrieved during a second interaction corresponding to the entity of the second interaction. The second interaction may take place at a system different than the first interaction. The system may generate a response during the second interaction using the stored context data of the prior interaction.
-
公开(公告)号:US11804225B1
公开(公告)日:2023-10-31
申请号:US17375458
申请日:2021-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Ashish Kumar Agrawal , Kemal Oral Cansizlar , Suranjit Adhikari , Shucheng Zhu , Raefer Christopher Gabriel , Arindam Mandal
CPC classification number: G10L15/22 , G10L15/1815 , G10L15/30 , G10L2015/223
Abstract: Techniques for conversation recovery in a dialog management system are described. A system may determine, using dialog models, that a predicted action to be performed by a skill component is likely to result in an undesired response or that the skill component is unable to respond to a user input of a dialog session. Rather than informing the user that the skill component is unable to respond, the system may send data to the skill component to enable the skill component to determine a correct action responsive to the user input. The data may include an indication of the predicted action and/or entity data corresponding to the user input. The system may receive, from the skill component, response data corresponding to the user input, and may use the response data to update a dialog context for the dialog session and an inference engine of the dialog management system.
-
公开(公告)号:US20210304774A1
公开(公告)日:2021-09-30
申请号:US17228950
申请日:2021-04-13
Applicant: Amazon Technologies, Inc.
Inventor: Sundararajan Srinivasan , Arindam Mandal , Krishna Subramanian , Spyridon Matsoukas , Aparna Khare , Rohit Prasad
Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
-
公开(公告)号:US11043214B1
公开(公告)日:2021-06-22
申请号:US16204670
申请日:2018-11-29
Applicant: Amazon Technologies, Inc.
Inventor: Behnam Hedayatnia , Anirudh Raju , Ankur Gandhe , Chandra Prakash Khatri , Ariya Rastrow , Anushree Venkatesh , Arindam Mandal , Raefer Christopher Gabriel , Ahmad Shikib Mehri
Abstract: Described herein is a system for rescoring automatic speech recognition hypotheses for conversational devices that have multi-turn dialogs with a user. The system leverages dialog context by incorporating data related to past user utterances and data related to the system generated response corresponding to the past user utterance. Incorporation of this data improves recognition of a particular user utterance within the dialog.
-
公开(公告)号:US10121494B1
公开(公告)日:2018-11-06
申请号:US15474603
申请日:2017-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US11935525B1
公开(公告)日:2024-03-19
申请号:US16895377
申请日:2020-06-08
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani
IPC: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/00 , G10L15/08 , G10L21/0216 , H04W4/02
CPC classification number: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/005 , G10L2015/088 , G10L2015/223 , G10L2021/02166 , H04W4/025
Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
-
公开(公告)号:US11657832B2
公开(公告)日:2023-05-23
申请号:US17022197
申请日:2020-09-16
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
IPC: G10L15/00 , G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L15/08
CPC classification number: G10L25/30 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/51 , G10L25/78 , G10L2015/088 , G10L2025/783
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US20220093094A1
公开(公告)日:2022-03-24
申请号:US17112512
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , Angeliki Metallinou , Vincent Auvray , Minmin Shen , Josey Diego Sandoval , Rohit Prasad , Thomas Taylor , Amotz Maimon
Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
-
公开(公告)号:US11200884B1
公开(公告)日:2021-12-14
申请号:US16181925
申请日:2018-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Sundararajan Srinivasan , Arindam Mandal , Krishna Subramanian , Spyridon Matsoukas , Aparna Khare , Rohit Prasad
Abstract: Techniques for labeling user inputs for updating user recognition voice profiles are described. A system may leverage various signals, generated during or after processing of a user input, to retroactively determine which user spoke the user input. For example, after the system receives the user input, the user may provide the system with non-spoken user verification information. Based on such user verification information, the system may label the previously spoken user input as originating from the particular user. The system may also or alternatively use system usage history to retroactively label user inputs.
-
公开(公告)号:US20200349928A1
公开(公告)日:2020-11-05
申请号:US16932049
申请日:2020-07-17
Applicant: Amazon Technologies, Inc.
Inventor: Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte
Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
-
-
-
-
-
-
-
-
-