-
公开(公告)号:US20220093101A1
公开(公告)日:2022-03-24
申请号:US17112520
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen
Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
-
公开(公告)号:US20250149036A1
公开(公告)日:2025-05-08
申请号:US18966827
申请日:2024-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Eli Joshua Fidler , Aaron Challenner , Zoe Adams , Sree Hari Krishnan Parthasarathi , Gengshen Fu
IPC: G10L15/22 , G10L15/02 , G10L15/08 , G10L15/187
Abstract: Systems and methods for preemptive wakeword detection are disclosed. For example, a first part of a wakeword is detected from audio data representing a user utterance. When this occurs, on-device speech processing is initiated prior to when the entire wakeword is detected. When the entire wakeword is detected, results from the on-device speech processing and/or the audio data is sent to a speech processing system to determine a responsive action to be performed by the device. When the entire wakeword is not detected, on-device processing is canceled and the device refrains from sending the audio data to the speech processing system.
-
公开(公告)号:US12205601B1
公开(公告)日:2025-01-21
申请号:US17853183
申请日:2022-06-29
Applicant: Amazon Technologies, Inc.
Inventor: David McGuire , Ahmed Abdelal , Sai Kiran Venkata Subramanya Rupanagudi , Sumit Garg , Terrence Yu , Nathaniel White , Siddharth Agrawal , Pavas Kant , Yuxuan Hao , Nagaraj Mahajan , Ameya Agaskar , Aaron Challenner
IPC: G10L19/018 , G06F21/62 , G06V20/40 , G11B27/34 , H04R3/00
Abstract: A system configured to perform content recognition using fingerprinting to recognize known media content. A device determines fingerprints based on decoded content data to be sent using a media interface component to an output component. Metadata related to the content/device/fingerprint may also be created. The fingerprints and metadata are sent by the device to a supporting system for orchestration and matching of the fingerprints to known media content.
-
公开(公告)号:US12190875B1
公开(公告)日:2025-01-07
申请号:US17490572
申请日:2021-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Eli Joshua Fidler , Aaron Challenner , Zoe Adams , Sree Hari Krishnan Parthasarathi , Gengshen Fu
IPC: G10L15/00 , G10L15/02 , G10L15/22 , G10L15/08 , G10L15/187
Abstract: Systems and methods for preemptive wakeword detection are disclosed. For example, a first part of a wakeword is detected from audio data representing a user utterance. When this occurs, on-device speech processing is initiated prior to when the entire wakeword is detected. When the entire wakeword is detected, results from the on-device speech processing and/or the audio data is sent to a speech processing system to determine a responsive action to be performed by the device. When the entire wakeword is not detected, on-device processing is canceled and the device refrains from sending the audio data to the speech processing system.
-
公开(公告)号:US20240412728A1
公开(公告)日:2024-12-12
申请号:US18333041
申请日:2023-06-12
Applicant: Amazon Technologies, Inc.
Inventor: Michael Thomas Peterson , Gengshen Fu , Aaron Challenner , Rong Chen , Cody Jacques , Stefan M Bradstreet
Abstract: A device is configured to detect multiple different wakewords. A device may operate a joint encoder that operates on audio data to determine encoded audio data. The device may operate multiple different decoders which process the encoded audio data to determine if a wakeword is detected. Each decoder may correspond to a different wakeword. The decoders may use fewer computing resources than the joint encoder, allowing for the device to more easily perform multiple wakeword processing. Enabling/disabling wakeword(s) may involve the reconfiguring of a wakeword detector to add/remove data for respective decoder(s). Specific decoders may be activated/deactivated depending on device context, thereby efficiently managing device resources.
-
公开(公告)号:US11908468B2
公开(公告)日:2024-02-20
申请号:US17112520
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Siddhartha Reddy Jonnalagadda , Nikko Strom , Ariya Rastrow , Ying Shi , David Chi-Wai Tang , Nishtha Gupta , Aaron Challenner , Bonan Zheng , Angeliki Metallinou , Vincent Auvray , Minmin Shen
IPC: G10L25/78 , G10L15/22 , G10L15/24 , G10L15/08 , G10L15/06 , G06V40/20 , G06F3/16 , G10L13/08 , G10L15/20 , G06V40/10 , G06V10/40 , G10L15/02 , G06F18/24
CPC classification number: G10L15/22 , G06F3/167 , G06F18/24 , G06V10/40 , G06V40/10 , G06V40/20 , G10L13/08 , G10L15/02 , G10L15/063 , G10L15/08 , G10L15/20 , G10L15/222 , G10L15/24 , G10L2015/0635 , G10L2015/088 , G10L2015/223 , G10L2015/227
Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
-
公开(公告)号:US20220093093A1
公开(公告)日:2022-03-24
申请号:US17112227
申请日:2020-12-04
Applicant: Amazon Technologies, Inc.
Inventor: Prakash Krishnan , Arindam Mandal , Nikko Strom , Pradeep Natarajan , Ariya Rastrow , Shiv Naga Prasad Vitaladevuni , David Chi-Wai Tang , Aaron Challenner , Xu Zhang , Krishna Anisetty , Josey Diego Sandoval , Rohit Prasad , Premkumar Natarajan
Abstract: A system can operate a speech-controlled device in a mode where the speech-controlled device determines that an utterance is directed at the speech-controlled device using image data showing the user speaking the utterance. If the user is directing the user's gaze at the speech-controlled device while speaking, the system may determine the utterance is system directed and thus may perform further speech processing based on the utterance. If the user's gaze is directed elsewhere, the system may determine the utterance is not system directed (for example directed at another user) and thus the system may not perform further speech processing based on the utterance and may take other actions, for example discarding audio data of the utterance.
-
-
-
-
-
-