Patent search ap:("Google LLC") AND inv:"Aren Jansen" Page 1

1.

发明申请
Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models 有权

公开(公告)号：US20220310113A1

公开(公告)日：2022-09-29

申请号：US17214186

申请日：2021-03-26

Applicant: Google LLC

Inventor： Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R. Hershey

IPC: G10L25/57 , G06K9/00 , G06K9/62 , G10L25/30 , G06N3/08

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

2.

发明申请
Minimum-Example/Maximum-Batch Entropy-Based Clustering with Neural Networks 审中-公开

公开(公告)号：US20200372295A1

公开(公告)日：2020-11-26

申请号：US16880456

申请日：2020-05-21

Applicant: Google LLC

Inventor： Aren Jansen , Ryan Michael Rifkin , Daniel Ellis

IPC: G06K9/62 , G06N20/00

Abstract: A computing system can include an embedding model and a clustering model. The computing system input each of the plurality of inputs into the embedding model and receiving respective embeddings for the plurality of inputs as outputs of the embedding model. The computing system can input the respective embeddings for the plurality of inputs into the clustering model and receiving respective cluster assignments for the plurality of inputs as outputs of the clustering model. The computing system can evaluate a clustering loss function that evaluates a first average, across the plurality of inputs, of a respective first entropy of each respective probability distribution; and a second entropy of a second average of the probability distributions for the plurality of inputs. The computing system can modify parameter(s) of one or both of the clustering model and the embedding model based on the clustering loss function.

3.

发明授权
Methods and systems for implementing on-device non-semantic representation fine-tuning for speech classification 有权

公开(公告)号：US11996116B2

公开(公告)日：2024-05-28

申请号：US17000583

申请日：2020-08-24

Applicant: Google LLC

Inventor： Joel Shor , Ronnie Maor , Oran Lang , Omry Tuval , Marco Tagliasacchi , Ira Shavitt , Felix de Chaumont Quitry , Dotan Emanuel , Aren Jansen

IPC: G10L25/30 , G06F18/21 , G06N3/084 , G06N3/088 , G06N5/046 , G10L25/48

CPC classification number: G10L25/30 , G06F18/217 , G06N3/084 , G06N3/088 , G06N5/046 , G10L25/48

Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.

4.

发明授权
Unsupervised learning of semantic audio representations 有权

公开(公告)号：US11335328B2

公开(公告)日：2022-05-17

申请号：US16758564

申请日：2018-10-26

Applicant: Google LLC

Inventor： Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis

IPC: G10L15/06 , G10L15/16 , G10L15/02 , G10L25/30 , G06N3/04 , G06N3/08 , G10L25/18 , G10L25/51

Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

5.

发明申请
Training Machine-Learned Models for Perceptual Tasks Using Biometric Data 有权

公开(公告)号：US20220130134A1

公开(公告)日：2022-04-28

申请号：US17428659

申请日：2020-01-16

Applicant: Google LLC

Inventor： Aren Jansen , Malcolm Slaney

IPC: G06V10/774 , G06V10/80 , G06V40/10

Abstract: Generally, the present disclosure is directed to systems and methods that train machine-learned models (e.g., artificial neural networks) to perform perceptual or cognitive task(s) based on biometric data (e.g., brain wave recordings) collected from living organism(s) while the living organism(s) are performing the perceptual or cognitive task(s). In particular, aspects of the present disclosure are directed to a new supervision paradigm, by which machine-learned feature extraction models are trained using example stimuli paired with companion biometric data such as neural activity recordings (e g electroencephalogram data, electrocorticography data, functional near-infrared spectroscopy, and/or magnetoencephalography data) collected from a living organism (e.g., human being) while the organism perceived those examples (e.g., viewing the image, listening to the speech, etc.).

6.

发明公开
Diffusion Models for Generation of Audio Data Based on Descriptive Textual Prompts 审中-公开

公开(公告)号：US20240282294A1

公开(公告)日：2024-08-22

申请号：US18651296

申请日：2024-04-30

Applicant: Google LLC

Inventor： Qingqing Huang , Daniel Sung-Joon Park , Aren Jansen , Timo Immanuel Denk , Yue Li , Ravi Ganti , Dan Ellis , Tao Wang , Wei Han , Joonseok Lee

IPC: G10L15/06 , G10L15/16

CPC classification number: G10L15/063 , G10L15/16

Abstract: A corpus of textual data is generated with a machine-learned text generation model. The corpus of textual data includes a plurality of sentences. Each sentence is descriptive of a type of audio. For each of a plurality of audio recordings, the audio recording is processed with a machine-learned audio classification model to obtain training data including the audio recording and one or more sentences of the plurality of sentences closest to the audio recording within a joint audio-text embedding space of the machine-learned audio classification model. The sentence(s) are processed with a machine-learned generation model to obtain an intermediate representation of the one or more sentences. The intermediate representation is processed with a machine-learned cascaded diffusion model to obtain audio data. The machine-learned cascaded diffusion model is trained based on a difference between the audio data and the audio recording.

7.

发明申请
System and Method for Generating Diagnostic Health Information Using Deep Learning and Sound Understanding 有权

公开(公告)号：US20210361227A1

公开(公告)日：2021-11-25

申请号：US17045318

申请日：2018-05-04

Applicant: Google LLC

Inventor： Katherine Chou , Michael Dwight Howell , Kasumi Widner , Ryan Rifkin , Henry George Wei , Daniel Ellis , Alvin Rajkomar , Aren Jansen , David Michael Parish , Michael Philip Brenner

IPC: A61B5/00 , G10L25/66 , G10L25/63 , G06N20/00

Abstract: The present disclosure provides systems and methods that generating health diagnostic information from an audio recording. A computing system can include a machine-learned health model comprising that includes a sound model trained to receive data descriptive of a patient audio recording and output sound description data. The computing system can include a diagnostic model trained to receive the sound description data and output a diagnostic score. The computing system can include at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed, cause the processor to perform operations. The operations can include obtaining the patient audio recording; inputting data descriptive of the patient audio recording into the sound model; receiving, as an output of the sound model, the sound description data; inputting the sound description data into the diagnostic model; and receiving, as an output of the diagnostic model, the diagnostic score.

8.

发明申请
Unsupervised Learning of Semantic Audio Representations 审中-公开

公开(公告)号：US20200349921A1

公开(公告)日：2020-11-05

申请号：US16758564

申请日：2018-10-26

Applicant: Google LLC

Inventor： Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis

IPC: G10L15/06 , G10L25/18 , G10L15/02 , G10L25/51 , G06N3/04 , G06N3/08

Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

9.

发明授权
Audio-visual separation of on-screen sounds based on machine learning models 有权

公开(公告)号：US12217768B2

公开(公告)日：2025-02-04

申请号：US18226545

申请日：2023-07-26

Applicant: Google LLC

Inventor： Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R. Hershey

IPC: G10L25/57 , G06F18/214 , G06N3/088 , G06V20/40 , G10L25/30

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

10.

发明授权
System and method for generating diagnostic health information using deep learning and sound understanding 有权

公开(公告)号：US12070323B2

公开(公告)日：2024-08-27

申请号：US17045318

申请日：2018-05-04

Applicant: Google LLC

Inventor： Katherine Chou , Michael Dwight Howell , Kasumi Widner , Ryan Rifkin , Henry George Wei , Daniel Ellis , Alvin Rajkomar , Aren Jansen , David Michael Parish , Michael Philip Brenner

IPC: A61B5/00 , G06N20/00 , G10L25/63 , G10L25/66

CPC classification number: A61B5/4803 , A61B5/7264 , A61B5/7275 , G06N20/00 , G10L25/63 , G10L25/66

Abstract: The present disclosure provides systems and methods that generating health diagnostic information from an audio recording. A computing system can include a machine-learned health model comprising that includes a sound model trained to receive data descriptive of a patient audio recording and output sound description data. The computing system can include a diagnostic model trained to receive the sound description data and output a diagnostic score. The computing system can include at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed, cause the processor to perform operations. The operations can include obtaining the patient audio recording; inputting data descriptive of the patient audio recording into the sound model; receiving, as an output of the sound model, the sound description data; inputting the sound description data into the diagnostic model; and receiving, as an output of the diagnostic model, the diagnostic score.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification