-
公开(公告)号:US11790932B2
公开(公告)日:2023-10-17
申请号:US17547644
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Qingming Tang , Chieh-Chi Kao , Qin Zhang , Ming Sun , Chao Wang , Sumit Garg , Rong Chen , James Garnet Droppo , Chia-Jung Chang
CPC classification number: G10L25/51 , G06N3/045 , G06N3/08 , G10L25/21 , G10L25/30 , G10L15/08 , G10L15/22 , G10L2015/088 , G10L2015/223
Abstract: A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data corresponding to audio data that potentially represents occurrence of one or more events. Based on processing by the first and second AED components, a device may output data indicating that one or more acoustic events occurred, where the acoustic events may be a predetermined acoustic event and/or a custom acoustic event.
-
公开(公告)号:US11132990B1
公开(公告)日:2021-09-28
申请号:US16453063
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechal , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
公开(公告)号:US11069353B1
公开(公告)日:2021-07-20
申请号:US16404536
申请日:2019-05-06
Applicant: Amazon Technologies, Inc.
Inventor: Yixin Gao , Ming Sun , Jason Krone , Shiv Naga Prasad Vitaladevuni , Yuzong Liu
Abstract: A system and method performs multilingual wakeword detection by determining a language corresponding to the wakeword. A first wakeword-detection component, which may execute using a digital-signal processor, determines that audio data includes a representation of the wakeword and determines a language corresponding to the wakeword. A second, more accurate wakeword-detection component may then process the audio data using the language to confirm that it includes the representation of the wakeword. The audio data may then be sent to a remote system for further processing.
-
公开(公告)号:US10460722B1
公开(公告)日:2019-10-29
申请号:US15639175
申请日:2017-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , David Snyder , Yixin Gao , Nikko Strom , Spyros Matsoukas , Shiv Naga Prasad Vitaladevuni
Abstract: A method for selective transmission of audio data to a speech processing server uses detection of an acoustic trigger in the audio data in determining the data to transmit. Detection of the acoustic trigger makes use of an efficient computation approach that reduces the amount of run-time computation required, or equivalently improves accuracy for a given amount of computation, by combining a “time delay” structure in which intermediate results of computations are reused at various time delays, thereby avoiding computation of computing new results, and decomposition of certain transformations to require fewer arithmetic operations without sacrificing significant performance. For a given amount of computation capacity the combination of these two techniques provides improved accuracy as compared to current approaches.
-
公开(公告)号:US09600231B1
公开(公告)日:2017-03-21
申请号:US14751975
申请日:2015-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Björn Hoffmeister , Shiv Naga Prasad Vitaladevuni , Varun Kumar Nagaraja
CPC classification number: G06F3/167 , G06F3/165 , G10L15/02 , G10L15/063 , G10L15/14 , G10L15/18 , G10L15/22 , G10L2015/0638 , G10L2015/088 , G10L2015/223
Abstract: A revised support vector machine (SVM) classifier is offered to distinguish between true keywords and false positives based on output from a keyword spotting component of a speech recognition system. The SVM operates on a reduced set of feature dimensions, where the feature dimensions are selected based on their ability to distinguish between true keywords and false positives. Further, support vectors pairs are merged to create a reduced set of re-weighted support vectors. These techniques result in an SVM that may be operated using reduced computing resources, thus improving system performance.
-
公开(公告)号:US12039998B1
公开(公告)日:2024-07-16
申请号:US17665129
申请日:2022-02-04
Applicant: Amazon Technologies, Inc.
Inventor: Chieh-Chi Kao , Qingming Tang , Ming Sun , Viktor Rozgic , Spyridon Matsoukas , Chao Wang
Abstract: An acoustic event detection system may employ self-supervised federated learning to update encoder and/or classifier machine learning models. In an example operation, an encoder may be pre-trained to extract audio feature data from an audio signal. A decoder may be pre-trained to predict a subsequent portion of audio data (e.g., a subsequent frame of audio data represented by log filterbank energies). The encoder and decoder may be trained using self-supervised learning to improve the decoder's predictions and, by extension, the quality of the audio feature data generated by the encoder. The system may apply federated learning to share encoder updates across user devices. The system may fine-tune the classifier to improve inferences based on the improved audio feature data. The system may distribute classifier updates to the user device(s) to update the on-device classifier.
-
公开(公告)号:US11996097B2
公开(公告)日:2024-05-28
申请号:US17359937
申请日:2021-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Yixin Gao , Ming Sun , Jason Krone , Shiv Naga Prasad Vitaladevuni , Yuzong Liu
CPC classification number: G10L15/22 , G10L15/005 , G10L15/08 , G10L15/142 , G10L15/16 , G10L25/78 , G06F40/263 , G10L2015/088
Abstract: A system and method performs multilingual wakeword detection by determining a language corresponding to the wakeword. A first wakeword-detection component, which may execute using a digital-signal processor, determines that audio data includes a representation of the wakeword and determines a language corresponding to the wakeword. A second, more accurate wakeword-detection component may then process the audio data using the language to confirm that it includes the representation of the wakeword. The audio data may then be sent to a remote system for further processing.
-
公开(公告)号:US11961514B1
公开(公告)日:2024-04-16
申请号:US17547610
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Chia-Jung Chang , Qingming Tang , Ming Sun , Chao Wang
CPC classification number: G10L15/16
Abstract: An acoustic event detection system may employ one or more recurrent neural networks (RNNs) to extract features from audio data, and use the extracted features to determine the presence of an acoustic event. The system may use self-attention to emphasize features extracted from portions of audio data that may include features more useful for detecting acoustic events. The system may perform self-attention in an iterative manner to reduce the amount of memory used to store hidden states of the RNN while processing successive portions of the audio data. The system may process the portions of the audio data using the RNN to generate a hidden state for each portion. The system may calculate an interim embedding for each hidden state. An interim embedding calculated for the last hidden state may be normalized to determine a final embedding representing features extracted from the input data by the RNN.
-
公开(公告)号:US20230186939A1
公开(公告)日:2023-06-15
申请号:US17547644
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Qingming Tang , Chieh-Chi Kao , Qin Zhang , Ming Sun , Chao Wang , Sumit Garg , Rong Chen , James Garnet Droppo , Chia-Jung Chang
Abstract: A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data corresponding to audio data that potentially represents occurrence of one or more events. Based on processing by the first and second AED components, a device may output data indicating that one or more acoustic events occurred, where the acoustic events may be a predetermined acoustic event and/or a custom acoustic event.
-
公开(公告)号:US11670299B2
公开(公告)日:2023-06-06
申请号:US17321999
申请日:2021-05-17
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechai , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
-
-
-
-
-
-
-
-