Patent search ap:("Google LLC") AND inv:"Manoj Plakal" Page 1

1.

发明授权
Unsupervised learning of semantic audio representations 有权

公开(公告)号：US11335328B2

公开(公告)日：2022-05-17

申请号：US16758564

申请日：2018-10-26

Applicant: Google LLC

Inventor： Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis

IPC: G10L15/06 , G10L15/16 , G10L15/02 , G10L25/30 , G06N3/04 , G06N3/08 , G10L25/18 , G10L25/51

Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

2.

发明公开
Systems and Methods for Upmixing Audiovisual Data 审中-公开

公开(公告)号：US20230308823A1

公开(公告)日：2023-09-28

申请号：US18042258

申请日：2020-08-26

Applicant: Manoj PLAKAL , Dan ELLIS , Shawn HERSHEY , Richard Channing MOORE, III , Aren JANSEN , Google LLC

Inventor： Aren Jansen , Manoj Plakal , Dan Ellis , Shawn Hershey , Richard Channing Moore, III

IPC: H04S7/00

CPC classification number: H04S7/301 , H04S2400/01

Abstract: A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input to a machine-learned audiovisual upmixing model. The audiovisual upmixing model can include a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data. The computer-implemented method can include receiving upmixed audio data from the audiovisual upmixing model. The upmixed audio data can have a second number of audio channels. The second number of audio channels can be greater than the first number of audio channels.

3.

发明授权
Systems and methods that leverage deep learning to selectively store audiovisual content 有权

公开(公告)号：US10372991B1

公开(公告)日：2019-08-06

申请号：US15944415

申请日：2018-04-03

Applicant: Google LLC

Inventor： James Niemasik , Manoj Plakal

IPC: H04N5/93 , G06K9/00 , G10L25/57 , G06N3/04 , G10L25/30

Abstract: Systems, methods, and devices for curating audiovisual content are provided. A mobile image capture device can be operable to capture one or more images; receive an audio signal; analyze at least a portion of the audio signal with a first machine-learned model to determine a first audio classifier label descriptive of an audio event; identify a first image associated with the first audio classifier label; analyze the first image with a second machine-learned model to determine a desirability of a scene depicted by the first image; and determine, based at least in part on the desirability of the scene depicted by the first image, whether to store a copy of the first image associated with the first audio classifier label in the non-volatile memory of the mobile image capture device or to discard the first image without storing a copy of the first image.

4.

发明授权
Systems and methods for upmixing audiovisual data 有权

公开(公告)号：US12273697B2

公开(公告)日：2025-04-08

申请号：US18042258

申请日：2020-08-26

Applicant: Google LLC

Inventor： Aren Jansen , Manoj Plakal , Dan Ellis , Shawn Hershey , Richard Channing Moore, III

IPC: H04S7/00 , H04S3/00

Abstract: A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input to a machine-learned audiovisual upmixing model. The audiovisual upmixing model can include a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data. The computer-implemented method can include receiving upmixed audio data from the audiovisual upmixing model. The upmixed audio data can have a second number of audio channels. The second number of audio channels can be greater than the first number of audio channels.

5.

发明申请
Unsupervised Learning of Semantic Audio Representations 审中-公开

公开(公告)号：US20200349921A1

公开(公告)日：2020-11-05

申请号：US16758564

申请日：2018-10-26

Applicant: Google LLC

Inventor： Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis

IPC: G10L15/06 , G10L25/18 , G10L15/02 , G10L25/51 , G06N3/04 , G06N3/08

Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

Patent Agency Ranking