-
公开(公告)号:US20220116707A1
公开(公告)日:2022-04-14
申请号:US17091482
申请日:2020-11-06
Applicant: Google LLC
Inventor: Jamie Alexander Zyskowski , Karolis Misiunas , Thomás William Inskip, VI , Mohamad Hassan bin Mohamad Rom
Abstract: Methods, systems, devices, and computer-readable storage media for activity detection of a user of a computing device, using multi-modal sensing. A device can be configured to receive sensor data corresponding to multiple modalities and process the sensor data to predict an activity performed by a user of a computing device. The device in response to the detected activity can perform a response action, such as muting or pausing audio playback from the computing device. Different modalities can be combined, such as body vibration data, air vibration data, and image data, which can be processed to distinguish user activity, e.g., speaking versus not speaking, to allow the computing device to perform the correct corresponding action.
-
公开(公告)号:US20230013370A1
公开(公告)日:2023-01-19
申请号:US17856292
申请日:2022-07-01
Applicant: Google LLC
Inventor: Yunpeng Li , Marco Tagliasacchi , Dominik Roblek , Félix de Chaumont Quitry , Beat Gfeller , Hannah Raphaelle Muckenhirn , Victor Ungureanu , Oleg Rybakov , Karolis Misiunas , Zalán Borsos
IPC: G10L19/022 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
-
公开(公告)号:US20250014591A1
公开(公告)日:2025-01-09
申请号:US18886136
申请日:2024-09-16
Applicant: Google LLC
Inventor: Julian Maclaren , Karolis Misiunas , Vahe Tshitoyan , Brian Foo , Kelly Dobson
Abstract: Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.
-
公开(公告)号:US11895474B2
公开(公告)日:2024-02-06
申请号:US17751094
申请日:2022-05-23
Applicant: Google LLC
Inventor: Jamie Alexander Zyskowski , Karolis Misiunas , Thomas William Inskip, VI , Mohamad Hassan bin Mohamad Rom
IPC: H04R5/02 , H04R5/033 , H04R5/04 , H04R3/04 , H04R1/10 , G06F3/01 , G06F3/16 , G06N20/00 , G06N3/084
CPC classification number: H04R5/033 , G06F3/011 , G06F3/017 , G06F3/16 , G06N20/00 , H04R1/1091 , H04R3/04 , H04R5/04 , G06N3/084 , H04R2420/07 , H04R2430/01
Abstract: Methods, systems, devices, and computer-readable storage media for activity detection of a user of a computing device, using multi-modal sensing. A device can be configured to receive sensor data corresponding to multiple modalities and process the sensor data to predict an activity performed by a user of a computing device. The device in response to the detected activity can perform a response action, such as muting or pausing audio playback from the computing device. Different modalities can be combined, such as body vibration data, air vibration data, and image data, which can be processed to distinguish user activity, e.g., speaking versus not speaking, to allow the computing device to perform the correct corresponding action.
-
公开(公告)号:US20230230612A1
公开(公告)日:2023-07-20
申请号:US17578217
申请日:2022-01-18
Applicant: Google LLC
Inventor: Julian Maclaren , Karolis Misiunas , Vahe Tshitoyan , Brian Foo , Kelly Dobson
CPC classification number: G10L25/51 , H04R1/406 , H04R1/04 , H04R3/005 , G10L21/028 , G10L25/78 , G10L15/22 , A61B5/02416 , A61B2503/12
Abstract: Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.
-
公开(公告)号:US12190896B2
公开(公告)日:2025-01-07
申请号:US17856292
申请日:2022-07-01
Applicant: Google LLC
Inventor: Yunpeng Li , Marco Tagliasacchi , Dominik Roblek , Félix de Chaumont Quitry , Beat Gfeller , Hannah Raphaelle Muckenhirn , Victor Ungureanu , Oleg Rybakov , Karolis Misiunas , Zalán Borsos
IPC: G10L19/022 , G06N3/045
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
-
公开(公告)号:US12119019B2
公开(公告)日:2024-10-15
申请号:US17578217
申请日:2022-01-18
Applicant: Google LLC
Inventor: Julian Maclaren , Karolis Misiunas , Vahe Tshitoyan , Brian Foo , Kelly Dobson
CPC classification number: G10L25/51 , A61B5/02416 , G10L15/22 , G10L21/028 , G10L25/78 , H04R1/04 , H04R1/406 , H04R3/005 , A61B2503/12
Abstract: Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.
-
公开(公告)号:US20220303688A1
公开(公告)日:2022-09-22
申请号:US17751094
申请日:2022-05-23
Applicant: Google LLC
Inventor: Jamie Alexander Zyskowski , Karolis Misiunas , Thomas William Inskip, VI , Mohamad Hassan bin Mohamad Rom
Abstract: Methods, systems, devices, and computer-readable storage media for activity detection of a user of a computing device, using multi-modal sensing. A device can be configured to receive sensor data corresponding to multiple modalities and process the sensor data to predict an activity performed by a user of a computing device. The device in response to the detected activity can perform a response action, such as muting or pausing audio playback from the computing device. Different modalities can be combined, such as body vibration data, air vibration data, and image data, which can be processed to distinguish user activity, e.g., speaking versus not speaking, to allow the computing device to perform the correct corresponding action.
-
公开(公告)号:US11343612B2
公开(公告)日:2022-05-24
申请号:US17091482
申请日:2020-11-06
Applicant: Google LLC
Inventor: Jamie Alexander Zyskowski , Karolis Misiunas , Thomás William Inskip, VI , Mohamad Hassan bin Mohamad Rom
IPC: H04R5/02 , H04R5/033 , H04R5/04 , H04R3/04 , H04R1/10 , G06F3/01 , G06F3/16 , G06N20/00 , G06N3/08
Abstract: Methods, systems, devices, and computer-readable storage media for activity detection of a user of a computing device, using multi-modal sensing. A device can be configured to receive sensor data corresponding to multiple modalities and process the sensor data to predict an activity performed by a user of a computing device. The device in response to the detected activity can perform a response action, such as muting or pausing audio playback from the computing device. Different modalities can be combined, such as body vibration data, air vibration data, and image data, which can be processed to distinguish user activity, e.g., speaking versus not speaking, to allow the computing device to perform the correct corresponding action.
-
-
-
-
-
-
-
-