Patent search ap:("Google LLC") AND inv:"Marco Tagliasacchi" Page 1

1.

发明授权
Self-supervised audio representation learning for mobile devices 有权

公开(公告)号：US12165663B2

公开(公告)日：2024-12-10

申请号：US17986477

申请日：2022-11-14

Applicant: Google LLC

Inventor： Beat Gfeller , Dominik Roblek , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L19/035 , G06N20/00 , G10L19/038 , G10L25/18

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

2.

发明公开
GENERATING CODED DATA REPRESENTATIONS USING NEURAL NETWORKS AND VECTOR QUANTIZERS 审中-公开

公开(公告)号：US20240185870A1

公开(公告)日：2024-06-06

申请号：US18400992

申请日：2023-12-29

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G06N3/045 , G06N3/08 , G10L25/30

CPC classification number: G10L19/038 , G06N3/045 , G06N3/08 , G10L25/30 , G10L2019/0002

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

3.

发明授权
Self-supervised audio representation learning for mobile devices 有权

公开(公告)号：US11501787B2

公开(公告)日：2022-11-15

申请号：US16548146

申请日：2019-08-22

Applicant: Google LLC

Inventor： Beat Gfeller , Dominik Roblek , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L19/035 , G06N20/00 , G10L19/038 , G10L25/18

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

4.

发明授权
Generating audio waveforms using encoder and decoder neural networks 有权

公开(公告)号：US12190896B2

公开(公告)日：2025-01-07

申请号：US17856292

申请日：2022-07-01

Applicant: Google LLC

Inventor： Yunpeng Li , Marco Tagliasacchi , Dominik Roblek , Félix de Chaumont Quitry , Beat Gfeller , Hannah Raphaelle Muckenhirn , Victor Ungureanu , Oleg Rybakov , Karolis Misiunas , Zalán Borsos

IPC: G10L19/022 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.

5.

发明授权
Methods and systems for implementing on-device non-semantic representation fine-tuning for speech classification 有权

公开(公告)号：US11996116B2

公开(公告)日：2024-05-28

申请号：US17000583

申请日：2020-08-24

Applicant: Google LLC

Inventor： Joel Shor , Ronnie Maor , Oran Lang , Omry Tuval , Marco Tagliasacchi , Ira Shavitt , Felix de Chaumont Quitry , Dotan Emanuel , Aren Jansen

IPC: G10L25/30 , G06F18/21 , G06N3/084 , G06N3/088 , G06N5/046 , G10L25/48

CPC classification number: G10L25/30 , G06F18/217 , G06N3/084 , G06N3/088 , G06N5/046 , G10L25/48

Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.

6.

发明授权
Compressing audio waveforms using neural networks and vector quantizers 有权

公开(公告)号：US11990148B2

公开(公告)日：2024-05-21

申请号：US18106094

申请日：2023-02-06

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G06N3/045 , G06N3/08 , G10L19/00 , G10L25/30

CPC classification number: G10L19/038 , G06N3/045 , G06N3/08 , G10L25/30 , G10L2019/0002

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

7.

发明公开
Machine Learning for Microphone Style Transfer 审中-公开

公开(公告)号：US20230395087A1

公开(公告)日：2023-12-07

申请号：US18249126

申请日：2021-10-15

Applicant: Google LLC

Inventor： Marco Tagliasacchi , Beat Gfeller , Yunpeng Li , Zalán Borsos

IPC: G10L21/007 , G10L15/06 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21

CPC classification number: G10L21/007 , G10L15/063 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21 , G10L2015/088

Abstract: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.

8.

发明公开
LEARNED AUDIO FRONTEND MACHINE LEARNING MODEL FOR AUDIO UNDERSTANDING 审中-公开

公开(公告)号：US20230377561A1

公开(公告)日：2023-11-23

申请号：US18029843

申请日：2021-10-04

Applicant: Google LLC

Inventor： Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L15/02 , G10L15/16

CPC classification number: G10L15/02 , G10L15/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio inputs using a learned audio frontend machine learning model that processes the audio input to generate a representation of the audio input. The representation can then be processed by an audio understanding model to generate a respective output for each of one or more audio understanding tasks.

9.

发明申请
COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS 有权

公开(公告)号：US20230019128A1

公开(公告)日：2023-01-19

申请号：US17856856

申请日：2022-07-01

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G10L25/30 , G06N3/04 , G06N3/08

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

10.

发明申请
Text-Conditioned Speech Inpainting 有权

公开(公告)号：US20250149022A1

公开(公告)日：2025-05-08

申请号：US18837723

申请日：2023-02-13

Applicant: Google LLC

Inventor： Zalán Borsos , Marco Tagliasacchi , Matthew Sharifi

IPC: G10L13/08 , G10L25/30

Abstract: Provided are systems, methods, and machine learning models for filling in gaps (e.g., of up to one second) in speech samples by leveraging an auxiliary textual input. Example machine learning models described herein can perform speech inpainting with the appropriate content, while maintaining speaker identity, prosody and recording environment conditions, and generalizing to unseen speakers. This approach significantly outperforms baselines constructed using adaptive TTS, as judged by human raters in side-by-side preference and MOS tests.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification