-
公开(公告)号:US10032463B1
公开(公告)日:2018-07-24
申请号:US14982587
申请日:2015-12-29
Applicant: Amazon Technologies, Inc.
Inventor: Ariya Rastrow , Nikko Ström , Spyridon Matsoukas , Markus Dreyer , Ankur Gandhe , Denis Sergeyevich Filimonov , Julian Chan , Rohit Prasad
IPC: G10L15/183 , G10L15/197 , G10L15/16 , G10L25/30 , G10L15/26 , G10L15/06 , G10L15/22
Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.
-
公开(公告)号:US20180012593A1
公开(公告)日:2018-01-11
申请号:US15641169
申请日:2017-07-03
Applicant: Amazon Technologies, Inc.
Inventor: Rohit Prasad , Kenneth John Basye , Spyridon Matsoukas , Rajiv Ramachandran , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister
IPC: G10L15/18
CPC classification number: G10L15/18 , G10L15/08 , G10L15/30 , G10L2015/088
Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
-
公开(公告)号:US09653093B1
公开(公告)日:2017-05-16
申请号:US14463411
申请日:2014-08-19
Applicant: Amazon Technologies, Inc.
Inventor: Spyridon Matsoukas , Nikko Ström , Ariya Rastrow , Sri Venkata Surya Siva Rama Krishna Garimella
CPC classification number: G10L15/16 , G10L15/08 , G10L15/142 , G10L15/144
Abstract: Features are disclosed for using an artificial neural network to generate customized speech recognition models during the speech recognition process. By dynamically generating the speech recognition models during the speech recognition process, the models can be customized based on the specific context of individual frames within the audio data currently being processed. In this way, dependencies between frames in the current sequence can form the basis of the models used to score individual frames of the current sequence. Thus, each frame of the current sequence (or some subset thereof) may be scored using one or more models customized for the particular frame in context.
-
公开(公告)号:US11270685B2
公开(公告)日:2022-03-08
申请号:US16726051
申请日:2019-12-23
Applicant: Amazon Technologies, Inc.
Inventor: Spyridon Matsoukas , Aparna Khare , Vishwanathan Krishnamoorthy , Shamitha Somashekar , Arindam Mandal
Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
-
公开(公告)号:US11081104B1
公开(公告)日:2021-08-03
申请号:US15838917
申请日:2017-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Chengwei Su , Sankaranarayanan Ananthakrishnan , Spyridon Matsoukas , Shirin Saleem , Rahul Gupta , Kavya Ravikumar , John Will Crimmins , Kelly James Vanee , John Pelak , Melanie Chie Bomke Gens
IPC: G10L15/18 , G10L15/22 , G10L15/06 , G10L15/183 , H04L29/08 , G10L15/32 , G06K9/00 , H04W4/02 , G10L15/26 , G06F16/31 , G06F40/295
Abstract: A natural language understanding system that can determine an overall score for a natural language hypothesis using hypothesis-specific component scores from different aspects of NLU processing as well as context data describing the context surrounding the utterance corresponding to the natural language hypotheses. The individual component scores may be input into a feature vector at a location corresponding to a type of a device captured by the utterance. Other locations in the feature vector corresponding to other device types may be populated with zero values. The feature vector may also be populated with other values represent other context data. The feature vector may then be multiplied by a weight vector comprising trained weights corresponding to the feature vector positions to determine a new overall score for each hypothesis, where the overall score incorporates the impact of the context data. Natural language hypotheses can be ranked using their respective new overall scores.
-
公开(公告)号:US11043218B1
公开(公告)日:2021-06-22
申请号:US16452964
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechal , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
公开(公告)号:US11043205B1
公开(公告)日:2021-06-22
申请号:US15838974
申请日:2017-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Chengwei Su , Sankaranarayanan Ananthakrishnan , Spyridon Matsoukas , Rahul Gupta , Kelly James Vanee
IPC: G10L15/22 , G10L15/18 , G10L15/06 , G10L15/16 , G10L15/183 , G06N3/02 , G06N20/00 , G06F16/31 , G06F40/295
Abstract: A natural language processing system that can determine an overall score for a natural language hypothesis using hypothesis-specific component scores from different aspects of NLU processing. The individual component scores may be weighted by weights trained to optimize the overall scores relative to each other. Each domain of the system may be configured with a separate component that determines the overall score with respect to the domain. Natural language hypotheses can be ranked using the overall score either within a specific domain or on a cross-domain basis.
-
公开(公告)号:US11004454B1
公开(公告)日:2021-05-11
申请号:US16182021
申请日:2018-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Sundararajan Srinivasan , Arindam Mandal , Krishna Subramanian , Spyridon Matsoukas , Aparna Khare , Rohit Prasad
Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
-
公开(公告)号:US10832662B2
公开(公告)日:2020-11-10
申请号:US15641169
申请日:2017-07-03
Applicant: Amazon Technologies, Inc.
Inventor: Rohit Prasad , Kenneth John Basye , Spyridon Matsoukas , Rajiv Ramachandran , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister
Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
-
公开(公告)号:US20200152195A1
公开(公告)日:2020-05-14
申请号:US16693826
申请日:2019-11-25
Applicant: Amazon Technologies, Inc.
Inventor: Ruhi Sarikaya , Rohit Prasad , Kerry Hammil , Spyridon Matsoukas , Nikko Strom , Frédéric Johan Georges Deramat , Stephen Frederick Potter , Young-Bum Kim
IPC: G10L15/22 , G06F40/295 , G10L15/26 , G10L15/08
Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.
-
-
-
-
-
-
-
-
-