Patent search ap:("GOOGLE LLC") AND inv:"Neil Zeghidour" Page 2

11.

发明申请
Machine-Learned Models for Generation of Musical Accompaniments Based on Input Vocals 有权

公开(公告)号：US20240395233A1

公开(公告)日：2024-11-28

申请号：US18671577

申请日：2024-05-22

Applicant: Google LLC

Inventor： Adam Joseph Roberts , Jesse Hart Engel , Ian Stuart Simon , Andrea Agostinelli , Neil Zeghidour , Christopher James Donahue , Antoine Caillon

IPC: G10H1/00 , G10H1/36 , G10L15/06 , G10L15/18 , G10L15/183

Abstract: Training data comprising a plurality of training pairs is obtained. Each training pair comprises instrumental audio data and vocal audio data separated from audio data of a musical work of a respective plurality of musical works. For one or more training pairs of the plurality of training pairs, the vocal audio data is processed with machine-learned model(s) of a machine-learned generative audio model grouping to obtain a vocal intermediate representation for the vocal audio data. The instrumental audio data is processed with a pre-trained encoding model to obtain an instrumental intermediate representation for the instrumental audio data. A loss function is evaluated that evaluates a difference between the vocal intermediate representation and the instrumental intermediate representation. Values of parameters of a machine-learned model of the machine-learned generative audio model grouping are modified based on the loss function.

12.

发明公开
LEARNING NEURAL NETWORK ARCHITECTURES BY BACKPROPAGATION USING DIFFERENTIABLE MASKS 审中-公开

公开(公告)号：US20240296331A1

公开(公告)日：2024-09-05

申请号：US18437202

申请日：2024-02-08

Applicant: Google LLC

Inventor： David Wilson Romero Guzman , Neil Zeghidour

IPC: G06N3/084 , G06N3/048

CPC classification number: G06N3/084 , G06N3/048

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for jointly learning the architecture of a neural network during the training of the neural network. In particular, the architecture of the neural network is learned using differentiable parametric masks.

13.

发明授权
Compressing audio waveforms using neural networks and vector quantizers 有权

公开(公告)号：US11600282B2

公开(公告)日：2023-03-07

申请号：US17856856

申请日：2022-07-01

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G10L25/30 , G10L19/00 , G06N3/08 , G06N3/04

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

14.

发明申请
Learning Strides in Convolutional Neural Networks 有权

公开(公告)号：US20250005354A1

公开(公告)日：2025-01-02

申请号：US18698691

申请日：2022-10-05

Applicant: Google LLC

Inventor： Neil Zeghidour , Rachid Riad , Olivier Teboul , David Grangier

IPC: G06N3/08

Abstract: A method of training a machine learning model, includes receiving training data for the machine learning model, wherein the training data comprises a plurality of batches. The method also includes applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer. Applying the downsampling layer of the machine learning model to a batch of the training data includes projecting an input in a spatial domain to a Fourier domain, constructing a mask in the Fourier domain based on a current value of the stride and dimensions of the input, applying the mask as a low-pass filter to the projected input to produce a tensor in the Fourier domain, cropping the tensor based on the mask, and transforming the cropped tensor to the spatial domain.

15.

发明授权
Compressing audio waveforms using neural networks and vector quantizers 有权

公开(公告)号：US11990148B2

公开(公告)日：2024-05-21

申请号：US18106094

申请日：2023-02-06

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G06N3/045 , G06N3/08 , G10L19/00 , G10L25/30

CPC classification number: G10L19/038 , G06N3/045 , G06N3/08 , G10L25/30 , G10L2019/0002

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

16.

发明公开
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS 审中-公开

公开(公告)号：US20240079001A1

公开(公告)日：2024-03-07

申请号：US18463196

申请日：2023-09-07

Applicant: Google LLC

Inventor： Andrea Agostinelli , Timo Immanuel Denk , Antoine Caillon , Neil Zeghidour , Jesse Engel , Mauro Verzetti , Christian Frank , Zalán Borsos , Matthew Sharifi , Adam Joseph Roberts

IPC: G10L15/16 , G10H1/00 , G10L15/06 , G10L15/18

CPC classification number: G10L15/16 , G10H1/0008 , G10L15/063 , G10L15/1815 , G10H2210/056 , G10H2250/311

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

17.

发明授权
End-to-end speech diarization via iterative speaker embedding 有权

公开(公告)号：US11887623B2

公开(公告)日：2024-01-30

申请号：US17304514

申请日：2021-06-22

Applicant: Google LLC

Inventor： David Grangier , Neil Zeghidour , Oliver Teboul

IPC: G10L25/78 , G06N3/04 , G10L15/06 , G10L15/07 , G10L17/18 , G10L19/008

CPC classification number: G10L25/78 , G06N3/04 , G10L15/063 , G10L15/07 , G10L17/18 , G10L19/008

Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

18.

发明公开
LEARNED AUDIO FRONTEND MACHINE LEARNING MODEL FOR AUDIO UNDERSTANDING 审中-公开

公开(公告)号：US20230377561A1

公开(公告)日：2023-11-23

申请号：US18029843

申请日：2021-10-04

Applicant: Google LLC

Inventor： Neil Zeghidour , Olivier Teboul , Félix de Chaumont Quitry , Marco Tagliasacchi

IPC: G10L15/02 , G10L15/16

CPC classification number: G10L15/02 , G10L15/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio inputs using a learned audio frontend machine learning model that processes the audio input to generate a representation of the audio input. The representation can then be processed by an audio understanding model to generate a respective output for each of one or more audio understanding tasks.

19.

发明申请
SEPARATING SPEECH BY SOURCE IN AUDIO RECORDINGS BY PREDICTING ISOLATED AUDIO SIGNALS CONDITIONED ON SPEAKER REPRESENTATIONS 有权

公开(公告)号：US20230112265A1

公开(公告)日：2023-04-13

申请号：US17967726

申请日：2022-10-17

Applicant: Google LLC

Inventor： Neil Zeghidour , David Grangier

IPC: G10L21/028 , G06N3/08 , G10L17/04 , G10L17/18 , G10L21/0316 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

20.

发明申请
COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS 有权

公开(公告)号：US20230019128A1

公开(公告)日：2023-01-19

申请号：US17856856

申请日：2022-07-01

Applicant: Google LLC

Inventor： Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek

IPC: G10L19/038 , G10L25/30 , G06N3/04 , G06N3/08

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification