-
公开(公告)号:US12132952B1
公开(公告)日:2024-10-29
申请号:US17895661
申请日:2022-08-25
IPC分类号: H04N21/439 , H04N21/422 , H04N21/4363 , H04N21/44
CPC分类号: H04N21/4394 , H04N21/42203 , H04N21/43637 , H04N21/44
摘要: A system configured to use keywords to augment audio playback with visual effects and/or other effects to provide an immersive audio experience. For example, a device can detect a keyword and control a color and intensity of external lights to provide visual feedback. In addition to visual effects, the device can trigger additional effects using smart plugs or other smart devices. In a listening enhancement mode in which the device outputs audio content, the device performs keyword detection by monitoring playback audio data for preconfigured keywords. In a storytelling mode in which a user reads a book out loud, the device may perform keyword detection by monitoring microphone audio data for the preconfigured keywords. Controlling the visual effects in response to keyword detection is enabled by a new pipeline that sends information back from a wakeword engine to an audio processor.
-
公开(公告)号:USD1045337S1
公开(公告)日:2024-10-08
申请号:US29871611
申请日:2023-02-23
申请人: Joy Chopak
设计人: Joy Chopak
摘要: FIG. 1 is a front perspective view of a swaddle showing my new design;
FIG. 2 is a front elevation view thereof;
FIG. 3 is a rear elevation view thereof;
FIG. 4 is a right side elevation view thereof;
FIG. 5 is a left side elevation view thereof;
FIG. 6 is the top plan view thereof;
FIG. 7 is the bottom plan view thereof; and,
FIG. 8 is a front perspective view of the swaddle shown in an open configuration.
All of the broken lines illustrate portions of the swaddle that form no part of the claimed design.-
公开(公告)号:US12101599B1
公开(公告)日:2024-09-24
申请号:US17952806
申请日:2022-09-26
发明人: Mohamed Mansour
摘要: Disclosed are techniques for an improved method for performing sound source localization (SSL) to determine a direction of arrival of an audible sound using a combination of timing information and amplitude information. For example, a device may decompose an observed sound field into directional components, then estimate a time-delay likelihood value and an energy-based likelihood value for each of the directional components. Using a combination of these likelihood values, the device can determine the direction of arrival corresponding to a maximum likelihood value. In some examples, the device may perform Acoustic Wave Decomposition processing to determine the directional components. In order to reduce a processing consumption associated with performing AWD processing, the device splits this process into two phases: a search phase that selects a subset of a device dictionary to reduce a complexity, and a decomposition phase that solves an optimization problem using the subset of the device dictionary.
-
公开(公告)号:US12087299B2
公开(公告)日:2024-09-10
申请号:US18085763
申请日:2022-12-21
CPC分类号: G10L15/22 , G06F3/017 , G06F3/167 , G10L15/30 , G10L2015/223
摘要: A speech-processing system may provide access to multiple virtual assistants via one or more voice-controlled devices. Each assistant may leverage language processing and language generation features of the speech-processing system, while handling different commands and/or providing access to different back applications. Different assistants may be available for use with a particular voice-controlled device based on time, location, the particular user, etc. The voice-controlled device may include components for facilitating user interaction with multiple assistants. For example, a multi-assistant component may facilitate enabling/disabling assistants, assigning gestures and/or wakewords, etc. The multi-assistant component may handle routing commands to a command processing subsystem corresponding to an assistant invoked by the command. The voice controlled device may further include observer components, each configured to monitor the voice-controlled device for invocations of a particular assistant.
-
公开(公告)号:US12080291B2
公开(公告)日:2024-09-03
申请号:US17708077
申请日:2022-03-30
发明人: Fabian Andreas Bumberger , Sabria Farheen , Maciej Makowski , Eli Joshua Fidler , Sasitheran Shanmugarajah
CPC分类号: G10L15/22 , G10L15/02 , G10L2015/223 , G10L15/30
摘要: This disclosure proposes systems and methods enabling on-device/hybrid processing of speech requests using a hub device. The hub device is capable of receiving audio data from surrounding devices and performing speech processing on the audio data to improve latency and/or provide functionality to other devices within a private network. The hub device may receive multiple requests corresponding to different utterances. If the hub device receives a second utterance while processing a first utterance, the hub device may send an error notification, process the first utterance and the second utterance sequentially, suspend processing of the first utterance to process the second utterance first, send the second utterance to another hub device or remote system, or suspend processing of the first utterance and send the first utterance to the remote system in order to process the second utterance.
-
公开(公告)号:US12058509B1
公开(公告)日:2024-08-06
申请号:US17546567
申请日:2021-12-09
摘要: A system configured to create a flexible home theater group using a variety of different devices. To enable the home theater group to generate synchronized audio, the system performs device localization to generate map data, which represents locations of devices in a device map. The map data may include a listening position and/or television, such that the map data is centered on the listening position with the television along a vertical axis. To generate the map data, the system selects a primary device that determines calibration data indicating a sequence when each of the individual devices generates playback audio. The primary device sends the calibration data to secondary devices and each device generates playback audio at a designated time in the sequence, enabling other devices to capture the output audio and determine a relative position of the playback device (for example using angle of arrival and distance information).
-
公开(公告)号:US12057115B2
公开(公告)日:2024-08-06
申请号:US17146997
申请日:2021-01-12
发明人: Christo Frank Devaraj , Venkata Krishnan Ramamoorthy , Gregory Michael Hart , Samuel Scott Gigliotti , Scott Southwood , Ran Mokady , Hale Sostock , Roman Yusufov
IPC分类号: G10L15/22 , G10L13/08 , G10L15/18 , G10L17/06 , G10L17/22 , H04L51/02 , H04L51/046 , H04L51/10 , H04L51/224 , H04L51/52 , H04L67/104 , H04L67/303 , H04L67/306
CPC分类号: G10L15/22 , G10L13/08 , G10L15/1815 , G10L17/06 , G10L17/22 , H04L51/02 , H04L51/046 , H04L51/224 , H04L67/1044 , H04L67/303 , H04L67/306 , G10L2015/223 , H04L51/10 , H04L51/52
摘要: Methods and systems for facilitating communications between shared electronic devices are described herein. In some embodiments, a group account may be assigned to a shared electronic device. The group account may include one or more user accounts, where individuals associated with those user accounts may interact with the shared electronic device, and also may form a part of the group account. When a message is sent from one shared electronic device to another personal device or shared electronic device, the message may be indicated as being sent from the group account, as if the shared electronic device corresponds to its own separate account. In some embodiments, speaker identification processing may be employed to determine a speaker of the message and, if the speaker is able to be identified, the message may be sent from the corresponding speaker's user account instead of the shared electronic device's corresponding group account.
-
公开(公告)号:US12033618B1
公开(公告)日:2024-07-09
申请号:US17546502
申请日:2021-12-09
发明人: Kai Wei , Thanh Dac Tran , Grant Strimel
CPC分类号: G10L15/1815 , G06N3/08 , G10L15/063 , G10L15/16 , G10L15/22 , G10L15/28 , G10L2015/228
摘要: Techniques for determining and storing relevant context information for a user input, such as a spoken input, are described. In some embodiments, context information is determined to be relevant on an audio frame basis. Context scores for different types of context data (e.g., prior dialog turn data, user profile data, device information, etc.) are determined for individual audio frames corresponding to a spoken input. Based on the corresponding context scores, the most relevant context is stored in a local context cache. The local context cache is updated as subsequent audio frames, of the user input, are processed. The data stored in the context cache is provided to downstream components to perform tasks such as ASR, NLU and SLU.
-
公开(公告)号:US11990120B2
公开(公告)日:2024-05-21
申请号:US16902992
申请日:2020-06-16
发明人: Travis Grizzel
CPC分类号: G10L15/01 , G06F3/017 , G10L13/00 , G10L15/18 , G10L15/187 , G10L15/24 , G10L2015/088
摘要: A system and method for associating motion data with utterance audio data for use with a speech processing system. A device, such as a wearable device, may be capable of capturing utterance audio data and sending it to a remote server for speech processing, for example for execution of a command represented in the utterance. The device may also capture motion data using motion sensors of the device. The motion data may correspond to gestures, such as head gestures, that may be interpreted by the speech processing system to determine and execute commands. The device may associate the motion data with the audio data so the remote server knows what motion data corresponds to what portion of audio data for purposes of interpreting and executing commands. Metadata sent with the audio data and/or motion data may include association data such as timestamps, session identifiers, message identifiers, etc.
-
公开(公告)号:US11990118B2
公开(公告)日:2024-05-21
申请号:US18206301
申请日:2023-06-06
发明人: Jaime Lorenzo Trueba , Thomas Renaud Drugman , Viacheslav Klimkov , Srikanth Ronanki , Thomas Edward Merritt , Andrew Paul Breen , Roberto Barra-Chicote
摘要: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
-
-
-
-
-
-
-
-
-