Patent search ap:("Google LLC") AND inv:"David Grangier" Page 1

1.

发明授权
Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations 有权

公开(公告)号：US12236970B2

公开(公告)日：2025-02-25

申请号：US17967726

申请日：2022-10-17

Applicant: Google LLC

Inventor： Neil Zeghidour , David Grangier

IPC: G10L21/028 , G06N3/045 , G06N3/08 , G10L17/04 , G10L17/18 , G10L21/0208 , G10L21/0272 , G10L21/0316 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

2.

发明公开
END-TO-END SPEECH DIARIZATION VIA ITERATIVE SPEAKER EMBEDDING 审中-公开

公开(公告)号：US20240144957A1

公开(公告)日：2024-05-02

申请号：US18544647

申请日：2023-12-19

Applicant: Google LLC

Inventor： David Grangier , Neil Zeghidour , Oliver Teboul

IPC: G10L25/78 , G06N3/04 , G10L15/06 , G10L15/07 , G10L17/18 , G10L19/008

CPC classification number: G10L25/78 , G06N3/04 , G10L15/063 , G10L15/07 , G10L17/18 , G10L19/008

Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

3.

发明公开
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS 审中-公开

公开(公告)号：US20240078412A1

公开(公告)日：2024-03-07

申请号：US18463092

申请日：2023-09-07

Applicant: Google LLC

Inventor： Neil Zeghidour , David Grangier , Marco Tagliasacchi , Raphaël Marinier , Olivier Teboul , Zalán Borsos

IPC: G06N3/0455 , G06N3/0475

CPC classification number: G06N3/0455 , G06N3/0475

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

4.

发明公开
Minimum Bayes Risk Decoding with Neural Quality Metrics 审中-公开

公开(公告)号：US20230259759A1

公开(公告)日：2023-08-17

申请号：US17673714

申请日：2022-02-16

Applicant: Google LLC

Inventor： Qijun Tan , Markus Freitag , David Grangier

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Provided are systems and methods for sequence-to-sequence modeling with neural quality metrics. More particularly, example aspects of the present disclosure relate to minimum bayes risk (MBR) decoding with neural metrics for machine translation. According to example aspects of the present disclosure, a set of candidate outputs can be sampled from a machine translation model given a source sequence. Given the set of candidate outputs, systems and methods according to example aspects of the present disclosure can select a hypothesis with high expected utility with respect to the distribution over a set of pseudo-references from the machine translation model.

5.

发明申请
End-To-End Speech Diarization Via Iterative Speaker Embedding 有权

公开(公告)号：US20220375492A1

公开(公告)日：2022-11-24

申请号：US17304514

申请日：2021-06-22

Applicant: Google LLC

Inventor： David Grangier , Neil Zeghidour , Oliver Teboul

IPC: G10L25/78 , G10L19/008 , G06N3/04 , G10L15/07 , G10L15/06 , G10L17/18

Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

6.

发明申请
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS 有权

公开(公告)号：US20240371366A1

公开(公告)日：2024-11-07

申请号：US18663899

申请日：2024-05-14

Applicant: Google LLC

Inventor： Neil Zeghidour , David Grangier , Marco Tagliasacchi , Raphaël Marinier , Olivier Teboul , Zalán Borsos

IPC: G10L15/16 , G06N3/0455 , G06N3/0475 , G10H1/00 , G10L15/06 , G10L15/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

7.

发明授权
Generating audio using auto-regressive generative neural networks 有权

公开(公告)号：US12020138B2

公开(公告)日：2024-06-25

申请号：US18463092

申请日：2023-09-07

Applicant: Google LLC

Inventor： Neil Zeghidour , David Grangier , Marco Tagliasacchi , Raphaël Marinier , Olivier Teboul , Zalán Borsos

IPC: G06N3/0455 , G06N3/0475

CPC classification number: G06N3/0455 , G06N3/0475

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

8.

发明授权
Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations 有权

公开(公告)号：US11475909B2

公开(公告)日：2022-10-18

申请号：US17170657

申请日：2021-02-08

Applicant: Google LLC

Inventor： Neil Zeghidour , David Grangier

IPC: G10L21/028 , G10L21/0316 , G10L17/04 , G10L17/18 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

9.

发明申请
Learning Strides in Convolutional Neural Networks 有权

公开(公告)号：US20250005354A1

公开(公告)日：2025-01-02

申请号：US18698691

申请日：2022-10-05

Applicant: Google LLC

Inventor： Neil Zeghidour , Rachid Riad , Olivier Teboul , David Grangier

IPC: G06N3/08

Abstract: A method of training a machine learning model, includes receiving training data for the machine learning model, wherein the training data comprises a plurality of batches. The method also includes applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer. Applying the downsampling layer of the machine learning model to a batch of the training data includes projecting an input in a spatial domain to a Fourier domain, constructing a mask in the Fourier domain based on a current value of the stride and dimensions of the input, applying the mask as a low-pass filter to the projected input to produce a tensor in the Fourier domain, cropping the tensor based on the mask, and transforming the cropped tensor to the spatial domain.

10.

发明授权
End-to-end speech diarization via iterative speaker embedding 有权

公开(公告)号：US11887623B2

公开(公告)日：2024-01-30

申请号：US17304514

申请日：2021-06-22

Applicant: Google LLC

Inventor： David Grangier , Neil Zeghidour , Oliver Teboul

IPC: G10L25/78 , G06N3/04 , G10L15/06 , G10L15/07 , G10L17/18 , G10L19/008

CPC classification number: G10L25/78 , G06N3/04 , G10L15/063 , G10L15/07 , G10L17/18 , G10L19/008

Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification