-
公开(公告)号:US11302329B1
公开(公告)日:2022-04-12
申请号:US16914589
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Spyridon Matsoukas , Venkata Naga Krishna Chaitanya Puvvada , Chao Wang , Chieh-Chi Kao
Abstract: A system may include an acoustic event detection component for detecting acoustic events, which may be non-speech sounds. Upon detection of a command to detect a new sound, a device may prompt a user to cause occurrence of the sound one or more times. The acoustic event detection component may then be reconfigured, using audio data corresponding to the occurrences, to detect future occurrences of the event.
-
公开(公告)号:US11227585B2
公开(公告)日:2022-01-18
申请号:US16815188
申请日:2020-03-11
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Alexandra R. Shapiro , Melanie Chie Bomke Gens , Spyridon Matsoukas , Kellen Gillespie , Rahul Goel
Abstract: Methods and systems for determining an intent of an utterance using contextual information associated with a requesting device are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Entity data representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. Natural language understanding processing may attempt to resolve one or more declared slots for a particular intent and may generate an initial list of intent hypotheses ranked to indicate which are most likely to correspond to the utterance. The entity data may be compared with the declared slots for the intent hypotheses, and the list of intent hypothesis may be re-ranked to account for matching slots from the contextual metadata. The top ranked intent hypothesis after re-ranking may then be selected as the utterance's intent.
-
公开(公告)号:US11200884B1
公开(公告)日:2021-12-14
申请号:US16181925
申请日:2018-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Sundararajan Srinivasan , Arindam Mandal , Krishna Subramanian , Spyridon Matsoukas , Aparna Khare , Rohit Prasad
Abstract: Techniques for labeling user inputs for updating user recognition voice profiles are described. A system may leverage various signals, generated during or after processing of a user input, to retroactively determine which user spoke the user input. For example, after the system receives the user input, the user may provide the system with non-spoken user verification information. Based on such user verification information, the system may label the previously spoken user input as originating from the particular user. The system may also or alternatively use system usage history to retroactively label user inputs.
-
公开(公告)号:US09697828B1
公开(公告)日:2017-07-04
申请号:US14311163
申请日:2014-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Rohit Prasad , Kenneth John Basye , Spyridon Matsoukas , Rajiv Ramachandran , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister
IPC: G10L15/18
CPC classification number: G10L15/18 , G10L15/08 , G10L15/30 , G10L2015/088
Abstract: Features are disclosed for detecting words in audio using environmental information and/or contextual information in addition to acoustic features associated with the words to be detected. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
-
公开(公告)号:US20210304774A1
公开(公告)日:2021-09-30
申请号:US17228950
申请日:2021-04-13
Applicant: Amazon Technologies, Inc.
Inventor: Sundararajan Srinivasan , Arindam Mandal , Krishna Subramanian , Spyridon Matsoukas , Aparna Khare , Rohit Prasad
Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
-
公开(公告)号:US11132990B1
公开(公告)日:2021-09-28
申请号:US16453063
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Ming Sun , Thibaud Senechal , Yixin Gao , Anish N. Shah , Spyridon Matsoukas , Chao Wang , Shiv Naga Prasad Vitaladevuni
Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
-
公开(公告)号:US20210134276A1
公开(公告)日:2021-05-06
申请号:US17090716
申请日:2020-11-05
Applicant: Amazon Technologies, Inc.
Inventor: Rohit Prasad , Kenneth John Basye , Spyridon Matsoukas , Rajiv Ramachandran , Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister
Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
-
公开(公告)号:US20200279555A1
公开(公告)日:2020-09-03
申请号:US16815188
申请日:2020-03-11
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Alexandra R. Shapiro , Melanie Chie Bomke Gens , Spyridon Matsoukas , Kellen Gillespie , Rahul Goel
Abstract: Methods and systems for determining an intent of an utterance using contextual information associated with a requesting device are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Entity data representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. Natural language understanding processing may attempt to resolve one or more declared slots for a particular intent and may generate an initial list of intent hypotheses ranked to indicate which are most likely to correspond to the utterance. The entity data may be compared with the declared slots for the intent hypotheses, and the list of intent hypothesis may be re-ranked to account for matching slots from the contextual metadata. The top ranked intent hypothesis after re-ranking may then be selected as the utterance's intent.
-
公开(公告)号:US10600406B1
公开(公告)日:2020-03-24
申请号:US15463339
申请日:2017-03-20
Applicant: Amazon Technologies, Inc.
Inventor: Alexandra R. Shapiro , Melanie Chie Bomke Gens , Spyridon Matsoukas , Kellen Gillespie , Rahul Goel
Abstract: Methods and systems for determining an intent of an utterance using contextual information associated with a requesting device are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Entity data representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. Natural language understanding processing may attempt to resolve one or more declared slots for a particular intent and may generate an initial list of intent hypotheses ranked to indicate which are most likely to correspond to the utterance. The entity data may be compared with the declared slots for the intent hypotheses, and the list of intent hypothesis may be re-ranked to account for matching slots from the contextual metadata. The top ranked intent hypothesis after re-ranking may then be selected as the utterance's intent.
-
公开(公告)号:US10121494B1
公开(公告)日:2018-11-06
申请号:US15474603
申请日:2017-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal
Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
-
-
-
-
-
-
-
-
-