Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Arindam Mandal"

31.

发明公开
USER PRESENCE DETECTION 审中-公开

公开(公告)号：US20230410833A1

公开(公告)日：2023-12-21

申请号：US18131531

申请日：2023-04-06

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal

IPC: G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78

CPC classification number: G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L2015/088

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

32.

发明申请
SPEECH BASED USER RECOGNITION 有权

公开(公告)号：US20220189458A1

公开(公告)日：2022-06-16

申请号：US17584489

申请日：2022-01-26

Applicant: Amazon Technologies, Inc.

Inventor： Spyridon Matsoukas , Aparna Khare , Vishwanathan Krishnamoorthy , Shamitha Somashekar , Arindam Mandal

IPC: G10L15/01 , G10L15/25 , G10L15/30

Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.

33.

发明申请
DIALOG MANAGEMENT FOR MULTIPLE USERS 有权

公开(公告)号：US20220093101A1

公开(公告)日：2022-03-24

申请号：US17112520

申请日：2020-12-04

Applicant: Amazon Technologies, Inc.

Inventor： Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen

IPC: G10L15/22 , G10L15/20 , G06F3/16 , G10L13/08

Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.

34.

发明授权
Goal-oriented dialog system 有权

公开(公告)号：US11200885B1

公开(公告)日：2021-12-14

申请号：US16219228

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Nikko Strom , Angeliki Metallinou , Tagyoung Chung , Dilek Hakkani-Tur , Suranjit Adhikari , Sridhar Yadav Manoharan , Ankita De , Qing Liu , Raefer Christopher Gabriel , Rohit Prasad

IPC: G10L15/22 , G10L21/00 , G10L15/06 , G10L15/18 , G06F16/332

Abstract: A dialog manager receives text data corresponding to a dialog with a user. Entities represented in the text data are identified. Context data relating to the dialog is maintained, which may include prior dialog, prior API calls, user profile information, or other data. Using the text data and the context data, an N-best list of one or more dialog models is selected to process the text data. After processing the text data, the outputs of the N-best models are ranked and a top-scoring output is selected. The top-scoring output may be an API call and/or an audio prompt.

35.

发明授权
Monophone-based background modeling for wakeword detection 有权

公开(公告)号：US10964315B1

公开(公告)日：2021-03-30

申请号：US15639330

申请日：2017-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Minhua Wu , Sankaran Panchapagesan , Ming Sun , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister , Ryan Paul Thomas , Arindam Mandal

IPC: G10L15/22 , G10L15/14 , G10L15/02 , G10L15/16 , G10L25/30 , G10L15/08

Abstract: An approach to wakeword detection uses an explicit representation of non-wakeword speech in the form of subword (e.g., phonetic monophone) units that do not necessarily occur in the wakeword and that broadly represent general speech. These subword units are arranged in a “background” model, which at runtime essentially competes with the wakeword model such that a wakeword is less likely to be declare as occurring when the input matches that background model well. An HMM may be used with the model to locate possible occurrences of the wakeword. Features are determined from portions of the input corresponding to subword units of the wakeword detected using the HMM. A secondary classifier is then used to process the features to yield a decision of whether the wakeword occurred.

36.

发明授权
Deep multi-channel acoustic modeling 有权

公开(公告)号：US10726830B1

公开(公告)日：2020-07-28

申请号：US16143910

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Arindam Mandal , Kenichi Kumatani , Nikko Strom , Minhua Wu , Shiva Sundaram , Bjorn Hoffmeister , Jeremie Lecomte

IPC: G10L15/16 , G10L15/22 , G10L15/30 , G10L15/06 , G06N3/08 , H04R3/00 , H04R1/40

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

37.

发明授权
Speech processing optimizations based on microphone array 有权

公开(公告)号：US10679621B1

公开(公告)日：2020-06-09

申请号：US15927764

申请日：2018-03-21

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani

IPC: G10L15/22 , G10L15/187 , G10L15/26 , G10L15/30 , H04R3/00 , G10L21/0208 , G06F40/40 , H04W4/02 , G10L21/0216 , G10L15/08

Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.

38.

发明授权
Device selection for providing a response 有权

公开(公告)号：US09875081B2

公开(公告)日：2018-01-23

申请号：US14860400

申请日：2015-09-21

Applicant: Amazon Technologies, Inc.

Inventor： James David Meyers , Shah Samir Pravinchandra , Yue Liu , Arlen Dean , Daniel Miller , Arindam Mandal

IPC: G10L15/22 , G10L15/00 , G06F3/16 , G10L15/26 , G10L15/18 , G10L15/06 , G10L15/32 , G01L21/00 , G10L15/08

CPC classification number: G06F3/167 , G10L15/00 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/222 , G10L15/26 , G10L15/32 , G10L2015/088 , G10L2015/223 , G10L2015/226

Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.

39.

发明申请
FINE-GRAINED NATURAL LANGUAGE UNDERSTANDING 审中-公开

公开(公告)号：US20170278514A1

公开(公告)日：2017-09-28

申请号：US15196540

申请日：2016-06-29

Applicant: AMAZON TECHNOLOGIES, INC.

Inventor： Lambert Mathias , Thomas Kollar , Arindam Mandal , Angeliki Metallinou

IPC: G10L15/22 , G06F17/30 , G10L15/18 , G10L15/14 , G10L15/26 , G10L15/02

CPC classification number: G10L15/22 , G06F17/277 , G06F17/279 , G06F17/30637 , G06F17/30654 , G06F17/30705 , G10L15/02 , G10L15/142 , G10L15/1815 , G10L15/26 , G10L2015/223

Abstract: A system capable of performing natural language understanding (NLU) without the concept of a domain that influences NLU results. The present system uses a hierarchical organizations of intents/commands and entity types, and trained models associated with those hierarchies, so that commands and entity types may be determined for incoming text queries without necessarily determining a domain for the incoming text. The system thus operates in a domain agnostic manner, in a departure from multi-domain architecture NLU processing where a system determines NLU results for multiple domains simultaneously and then ranks them to determine which to select as the result.

40.

发明申请
DEVICE SELECTION FOR PROVIDING A RESPONSE 有权

公开(公告)号：US20170083285A1

公开(公告)日：2017-03-23

申请号：US14860400

申请日：2015-09-21

Applicant: Amazon Technologies, Inc.

Inventor： James David Meyers , Shah Samir Pravinchandra , Yue Liu , Arlen Dean , Daniel Miller , Arindam Mandal

IPC: G06F3/16 , G10L15/26 , G10L15/18 , G10L15/22

CPC classification number: G06F3/167 , G10L15/00 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/222 , G10L15/26 , G10L15/32 , G10L2015/088 , G10L2015/223 , G10L2015/226

Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification