-
公开(公告)号:US11335328B2
公开(公告)日:2022-05-17
申请号:US16758564
申请日:2018-10-26
Applicant: Google LLC
Inventor: Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis
Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.
-
公开(公告)号:US20230308823A1
公开(公告)日:2023-09-28
申请号:US18042258
申请日:2020-08-26
Applicant: Manoj PLAKAL , Dan ELLIS , Shawn HERSHEY , Richard Channing MOORE, III , Aren JANSEN , Google LLC
Inventor: Aren Jansen , Manoj Plakal , Dan Ellis , Shawn Hershey , Richard Channing Moore, III
IPC: H04S7/00
CPC classification number: H04S7/301 , H04S2400/01
Abstract: A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input to a machine-learned audiovisual upmixing model. The audiovisual upmixing model can include a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data. The computer-implemented method can include receiving upmixed audio data from the audiovisual upmixing model. The upmixed audio data can have a second number of audio channels. The second number of audio channels can be greater than the first number of audio channels.
-
公开(公告)号:US10372991B1
公开(公告)日:2019-08-06
申请号:US15944415
申请日:2018-04-03
Applicant: Google LLC
Inventor: James Niemasik , Manoj Plakal
Abstract: Systems, methods, and devices for curating audiovisual content are provided. A mobile image capture device can be operable to capture one or more images; receive an audio signal; analyze at least a portion of the audio signal with a first machine-learned model to determine a first audio classifier label descriptive of an audio event; identify a first image associated with the first audio classifier label; analyze the first image with a second machine-learned model to determine a desirability of a scene depicted by the first image; and determine, based at least in part on the desirability of the scene depicted by the first image, whether to store a copy of the first image associated with the first audio classifier label in the non-volatile memory of the mobile image capture device or to discard the first image without storing a copy of the first image.
-
公开(公告)号:US12273697B2
公开(公告)日:2025-04-08
申请号:US18042258
申请日:2020-08-26
Applicant: Google LLC
Inventor: Aren Jansen , Manoj Plakal , Dan Ellis , Shawn Hershey , Richard Channing Moore, III
Abstract: A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input to a machine-learned audiovisual upmixing model. The audiovisual upmixing model can include a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data. The computer-implemented method can include receiving upmixed audio data from the audiovisual upmixing model. The upmixed audio data can have a second number of audio channels. The second number of audio channels can be greater than the first number of audio channels.
-
公开(公告)号:US20200349921A1
公开(公告)日:2020-11-05
申请号:US16758564
申请日:2018-10-26
Applicant: Google LLC
Inventor: Aren Jansen , Manoj Plakal , Richard Channing Moore , Shawn Hershey , Ratheet Pandya , Ryan Rifkin , Jiayang Liu , Daniel Ellis
Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.
-
-
-
-