-
公开(公告)号:US10600419B1
公开(公告)日:2020-03-24
申请号:US15712676
申请日:2017-09-22
Applicant: Amazon Technologies, Inc.
Inventor: Ruhi Sarikaya , Rohit Prasad , Kerry Hammil , Spyridon Matsoukas , Nikko Strom , Frédéric Johan Georges Deramat , Stephen Frederick Potter , Young-Bum Kim
Abstract: Techniques for performing command processing are described. A system receives, from a device, input data corresponding to a command. The system determines NLU processing results associated with multiple applications. The system also determines NLU confidences for the NLU processing results for each application. The system sends NLU processing results to a portion of the multiple applications, and receives output data or instructions from the portion of the applications. The system ranks the portion of the applications based at least in part on the NLU processing results associated with the portion of the applications as well as the output data or instructions provided by the portion of the applications. The system may also rank the portion of the applications using other data. The system causes content corresponding to output data or instructions provided by the highest ranked application to be output to a user.
-
公开(公告)号:US10522134B1
公开(公告)日:2019-12-31
申请号:US15388458
申请日:2016-12-22
Applicant: Amazon Technologies, Inc.
Inventor: Spyridon Matsoukas , Aparna Khare , Vishwanathan Krishnamoorthy , Shamitha Somashekar , Arindam Mandal
Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
-
公开(公告)号:US12112752B1
公开(公告)日:2024-10-08
申请号:US17688279
申请日:2022-03-07
Applicant: Amazon Technologies, Inc.
Inventor: Rahul Gupta , Jwala Dhamala , Apurv Verma , Qingwen Ye , Mayur Himmatbhai Dabhi , Srinivasan Rengarajan Veeravanallur , Spyridon Matsoukas , Melanie C B Gens , Seyed Omid Razavi , Avni Khatri , Premkumar Natarajan
CPC classification number: G10L15/22 , G10L15/01 , G10L15/063 , G10L15/08 , G10L2015/0631 , G10L2015/223
Abstract: Devices and techniques are generally described for cohort determination in natural language processing. In various examples, a first natural language input to a natural language processing system may be determined. The first natural language input may be associated with a first account identifier. A first machine learning model may determine first data representing one or more words of the first natural language input. A second machine learning model may determine second data representing one or more acoustic characteristics of the first natural language input. Third data may be determined, the third data including a predicted performance for processing the first natural language input by the natural language processing system. The third data may be determined based on the first data representation and the second data representation.
-
公开(公告)号:US12039998B1
公开(公告)日:2024-07-16
申请号:US17665129
申请日:2022-02-04
Applicant: Amazon Technologies, Inc.
Inventor: Chieh-Chi Kao , Qingming Tang , Ming Sun , Viktor Rozgic , Spyridon Matsoukas , Chao Wang
Abstract: An acoustic event detection system may employ self-supervised federated learning to update encoder and/or classifier machine learning models. In an example operation, an encoder may be pre-trained to extract audio feature data from an audio signal. A decoder may be pre-trained to predict a subsequent portion of audio data (e.g., a subsequent frame of audio data represented by log filterbank energies). The encoder and decoder may be trained using self-supervised learning to improve the decoder's predictions and, by extension, the quality of the audio feature data generated by the encoder. The system may apply federated learning to share encoder updates across user devices. The system may fine-tune the classifier to improve inferences based on the improved audio feature data. The system may distribute classifier updates to the user device(s) to update the on-device classifier.
-
公开(公告)号:US11893999B1
公开(公告)日:2024-02-06
申请号:US16055755
申请日:2018-08-06
Applicant: Amazon Technologies, Inc.
Inventor: Sai Sailesh Kopuri , John Moore , Sundararajan Srinivasan , Aparna Khare , Arindam Mandal , Spyridon Matsoukas , Rohit Prasad
Abstract: Techniques for enrolling a user in a system's user recognition functionality without requiring the user speak particular speech are described. The system may determine characteristics unique to a user input. The system may generate an implicit voice profile from user inputs having similar characteristics. After an implicit voice profile is generated, the system may receive a user input having speech characteristics similar to that of the implicit voice profile. The system may ask the user if the user wants the system to associate the implicit voice profile with a particular user identifier. If the user responds affirmatively, the system may request an identifier of a user profile (e.g., a user name). In response to receiving the user's name, the system may identify a user profile associated with the name and associate the implicit voice profile with the user profile, thereby converting the implicit voice profile into an explicit voice profile.
-
公开(公告)号:US20230410833A1
公开(公告)日:2023-12-21
申请号:US18131531
申请日:2023-04-06
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
CPC classification number: G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L2015/088
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
公开(公告)号:US11670299B2
公开(公告)日:2023-06-06
申请号:US17321999
申请日:2021-05-17
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechai , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
公开(公告)号:US11657804B2
公开(公告)日:2023-05-23
申请号:US17090716
申请日:2020-11-05
Applicant: Amazon Technologies, Inc.
Inventor: Rohit Prasad , Kenneth John Basye , Spyridon Matsoukas , Rajiv Ramachandran , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister
CPC classification number: G10L15/18 , G10L15/08 , G10L15/30 , G10L2015/088
Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
-
公开(公告)号:US11410646B1
公开(公告)日:2022-08-09
申请号:US16368399
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Cengiz Erbas , Thomas Kollar , Avnish Sikka , Spyridon Matsoukas , Simon Peter Reavely
Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.
-
公开(公告)号:US20220189458A1
公开(公告)日:2022-06-16
申请号:US17584489
申请日:2022-01-26
Applicant: Amazon Technologies, Inc.
Inventor: Spyridon Matsoukas , Aparna Khare , Vishwanathan Krishnamoorthy , Shamitha Somashekar , Arindam Mandal
Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
-
-
-
-
-
-
-
-
-