-
公开(公告)号:US20240364553A1
公开(公告)日:2024-10-31
申请号:US18770400
申请日:2024-07-11
Applicant: Dropbox, Inc.
Inventor: Shehzad Daredia , Behrooz Khorashadi
CPC classification number: H04L12/1831 , G06F16/345 , G06N20/00 , G10L15/083 , H04L12/1818 , H04L12/1822 , G10L2015/088
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating meeting insights based on media data and device input data. In one or more embodiments, the system analyzes media data and inputs to client devices associated with a meeting to determine a portion of the meeting that is relevant for a user. In one or more embodiments, the system generates a meeting summary, meeting highlights, or action items related to the media data to provide to the client device. In one or more embodiments, the system also uses the summary, highlights, or action items to train a machine-learning model for use with future meetings.
-
公开(公告)号:US20240363115A1
公开(公告)日:2024-10-31
申请号:US18770316
申请日:2024-07-11
Applicant: GOOGLE LLC
Inventor: Jonathan Hayden Gomes , Shashank Goel , Oscar Armando Azucena , Patrick Berny , Keun-Young Park , Matthew William Crowley
IPC: G10L15/22 , G06F3/14 , G06F3/16 , G10L15/08 , G10L15/16 , G10L15/30 , G10L21/0208 , H04M1/27 , H04R3/00
CPC classification number: G10L15/22 , G06F3/14 , G06F3/165 , G10L15/16 , G10L21/0208 , H04M1/271 , H04R3/00 , G10L2015/088 , G10L15/30 , G10L2021/02082
Abstract: Techniques are described herein for concurrent voice assistants. A method includes: providing first and second automated assistants with access to one or more microphones; receiving, from the first automated assistant, an indication that the first automated assistant has initiated a first session, and in response: continuing providing, to the first automated assistant, access to the one or more microphones; discontinuing providing, to the second automated assistant, access to the one or more microphones; and preventing the second automated assistant from accessing one or more portions of an output audio data stream; receiving, from the first automated assistant, an indication that the first session has ended, and in response: continuing providing, to the first automated assistant, access to the one or more microphones; resuming providing, to the second automated assistant, access to the one or more microphones; and resuming providing, to the second automated assistant, the output audio data stream.
-
公开(公告)号:US20240363113A1
公开(公告)日:2024-10-31
申请号:US18766909
申请日:2024-07-09
Applicant: Google LLC
Inventor: Kenneth Mixter , Diego Melendo Casado , Alexander H. Gruenstein , Terry Tai , Christopher Thaddeus Hughes , Matthew Nirvan Sharifi
CPC classification number: G10L15/22 , G10L15/32 , G10L2015/088 , G10L2015/223 , G10L25/60
Abstract: The various implementations described herein include methods and systems for determining device leadership among voice interface devices. In one aspect, a method is performed at a first electronic device of a plurality of electronic devices, each having microphones, a speaker, processors, and memory storing programs for execution by the processors. The first device detects a voice input. It determines a device state and a relevance of the voice input. It identifies a subset of electronic devices from the plurality to which the voice input is relevant. In accordance with a determination that the subset includes the first device, the first device determines a first score of a criterion associated with the voice input and receives second scores of the criterion from other devices in the subset. In accordance with a determination that the first score is higher than the second scores, the first device responds to the detected input.
-
公开(公告)号:US12131730B2
公开(公告)日:2024-10-29
申请号:US17298368
申请日:2019-11-19
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takashi Nakamura , Tomohiro Tanaka
CPC classification number: G10L15/18 , G10L15/16 , G10L2015/088
Abstract: A keyword is extracted robustly despite a voice recognition result including an error. A model storage unit 10 stores a keyword extraction model that accepts word vector representations of a plurality of words as an input and extracts and outputs a word vector representation of a word to be extracted as a keyword. A speech detection unit 11 detects a speech part from a voice signal. A voice recognition unit 12 executes voice recognition on the speech part of the voice signal and outputs a confusion network which is a voice recognition result. A word vector representation generating unit 13 generates a word vector representation including reliability of voice recognition with regard to each candidate word for each confusion set. A keyword extraction unit 14 inputs the word vector representation of the candidate word to the keyword extraction model in descending order of the reliability and obtains the word vector representation of the keyword.
-
公开(公告)号:US12130901B2
公开(公告)日:2024-10-29
申请号:US18511324
申请日:2023-11-16
Applicant: Q (CUE) LTD.
Inventor: Yonatan Wexler
IPC: G10L15/25 , A61B5/1171 , G06F16/532 , G06F21/32 , G06F40/40 , G06V20/50 , G06V40/16 , G10L13/02 , G10L15/08 , G10L15/20 , G10L15/22 , H04R1/02 , H04R1/10
CPC classification number: G06F21/32 , A61B5/1176 , G06F16/532 , G06F40/40 , G06V20/50 , G06V40/166 , G06V40/171 , G06V40/172 , G06V40/176 , G10L13/02 , G10L15/08 , G10L15/20 , G10L15/22 , G10L15/25 , H04R1/028 , H04R1/10 , G10L2015/088 , G10L2015/223
Abstract: Systems, methods, and non-transitory computer-readable media including instructions for detecting and utilizing facial skin micromovements are disclosed. In some non-limiting embodiments, the detection of the facial skin micromovements occurs using a speech detection system that may include a wearable housing, a light source (either a coherent light source or a non-coherent light source), a light detector, and at least one processor. One or more processors may be configured to analyze light reflections received from a facial region to determine the facial skin micromovements, and extract meaning from the determined facial skin micromovements. Examples of meaning that may be extracted from the determined facial skin micromovements may include words spoken by the individual (either silently spoken or vocally spoken), an identification of the individual, an emotional state of the individual, a heart rate of the individual, a respiration rate of the individual, or any other biometric, emotion, or speech-related indicator.
-
公开(公告)号:US12128765B2
公开(公告)日:2024-10-29
申请号:US17475828
申请日:2021-09-15
Applicant: Hyundai Motor Company , Kia Corporation
Inventor: Jihoon Kim
CPC classification number: B60K35/00 , G06F3/14 , G06F16/27 , G06F40/279 , G10L15/08 , G10L15/22 , B60K35/10 , B60K35/22 , B60K2360/148 , G10L2015/088 , G10L2015/223
Abstract: An embodiment vehicle includes an audio video navigation (AVN) device configured to execute an application, a display configured to display a screen of the application, an input device configured to receive a command from a user, and a processor configured to receive a backup command through the input device, in response to the backup command, generate snapshot data of the application being executed, extract a keyword based on the screen displayed on the display, generate metadata corresponding to the snapshot data and including a keyword, receive a restoration command that includes the keyword, the restoration command received thorough the input device, based on the received restoration command, select the metadata including the keyword, and restore data of the application based on the snapshot data corresponding to the selected metadata.
-
公开(公告)号:US20240347060A1
公开(公告)日:2024-10-17
申请号:US18750663
申请日:2024-06-21
Applicant: GOOGLE LLC
Inventor: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
CPC classification number: G10L15/22 , G10L15/05 , G10L15/1815 , G10L25/78 , G10L2015/088 , G10L2015/223
Abstract: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
公开(公告)号:US20240347057A1
公开(公告)日:2024-10-17
申请号:US18404254
申请日:2024-01-04
Applicant: Sonos, Inc.
Inventor: Connor Smith
IPC: G10L15/22 , G10L15/07 , G10L15/08 , H04L43/0811
CPC classification number: G10L15/22 , G10L15/07 , G10L15/08 , H04L43/0811 , G10L2015/088 , G10L2015/223
Abstract: As noted above, example techniques relate to offline voice control. A local voice input engine may process voice inputs locally when processing voice inputs via a cloud-based voice assistant service is not possible. Some techniques involve local (on-device) voice-assisted set-up of a cloud-based voice assistant service. Further example techniques involve local voice-assisted troubleshooting the cloud-based voice assistant service. Other techniques relate to interactions between local and cloud-based processing of voice inputs on a device that supports both local and cloud-based processing.
-
公开(公告)号:US20240345801A1
公开(公告)日:2024-10-17
申请号:US18432733
申请日:2024-02-05
Applicant: Sonos, Inc.
Inventor: Dayn Wilberding , John Tolomei
IPC: G06F3/16 , G06F3/04817 , G06F3/0488 , G06F9/451 , G10L15/08 , G10L15/22 , G10L17/22 , H04L12/28 , H04N21/422 , H04N21/436 , H04N21/439
CPC classification number: G06F3/167 , G06F3/04817 , G10L15/08 , G10L15/22 , H04L12/282 , H04N21/42203 , H04N21/43615 , H04N21/4394 , G06F3/0488 , G06F9/453 , G10L2015/088 , G10L2015/223 , G10L17/22
Abstract: Example techniques involve invoking voice assistance for a media playback system. In some embodiments, a NMD stores in memory a set of command information comprising a listing of playback commands and associated command criteria. The NMD captures a voice input and detects inclusion, within the voice input, of one or more particular playback commands from among the playback commands in the listing. In response, the NMD selects a local voice assistant that supports (a) one or more additional playback commands relative to a cloud-based VAS and (b) fewer non-playback commands relative to the cloud-based VAS, determines, via the local voice assistant, an intent in the captured voice input, and performs a response to the determined intent. The NMD foregoes selection of the cloud-based VAS when the local voice assistant is selected.
-
公开(公告)号:US12119000B2
公开(公告)日:2024-10-15
申请号:US18316434
申请日:2023-05-12
Applicant: Sonos, Inc.
Inventor: Connor Kristopher Smith , Matthew David Anderson
CPC classification number: G10L15/22 , G06F3/165 , G06F3/167 , G10L15/1815 , G10L15/30 , G10L25/51 , G10L2015/088 , G10L2015/223
Abstract: A device, such as Network Microphone Device or a playback device, detecting an event associated with the device or a system comprising the device. In response, an input detection window is opened for a given time period. During the given time period the device is arranged to receive an input sound data stream representing sound detected by a microphone. The input sound data stream is analyzed for a plurality of keywords and/or a wake-word for a Voice Assistant Service (VAS) and, based on the analysis, it is determined that the input sound data stream includes voice input data comprising a keyword or a wake-word for a VAS. In response, the device takes appropriate action such as causing the media playback system to perform a command corresponding to the keyword or sending at least part of the input sound data stream to the VAS.
-
-
-
-
-
-
-
-
-