Patent search ap:("Adobe Inc.") AND inv:"Justin SALAMON" Page 1

1.

发明申请
MULTI-LEVEL AUDIO SEGMENTATION USING DEEP EMBEDDINGS 有权

公开(公告)号：US20230115212A1

公开(公告)日：2023-04-13

申请号：US17742313

申请日：2022-05-11

Applicant: Adobe Inc.

Inventor： Justin SALAMON , Oriol NIETO-CABALLERO , Nicholas J. BRYAN

IPC: G10H1/00

Abstract: Embodiments are disclosed for generating an audio segmentation of an audio sequence using deep embeddings. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an audio sequence and extracting features for each frame of the audio sequence, where each frame is associated with a beat of the audio sequence. The method may further comprise clustering frames of the audio sequence into one or more clusters based on the extracted features and generating segments of the audio sequence based on the clustered frames, where each segment includes frames of the audio sequence from a same cluster. The method may further comprise constructing a multi-level audio segmentation of the audio sequence and performing a segment fusioning process that merges shorter segments with neighboring segments based on cluster assignments.

2.

发明申请
NATURAL LANGUAGE-GUIDED MUSIC AUDIO RECOMMENDATION FOR VIDEO USING MACHINE LEARNING 有权

公开(公告)号：US20240386048A1

公开(公告)日：2024-11-21

申请号：US18319202

申请日：2023-05-17

Applicant: Adobe Inc.

Inventor： Bryan RUSSELL , Justin SALAMON , Daniel McKEE , Josef SIVIC

IPC: G06F16/438 , G06F16/432

Abstract: Embodiments are disclosed for an audio recommendation system trained to recommend music audio sequences for pairing with query video sequences using neural networks. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including a query video sequence and natural language text. The disclosed systems and methods further comprise generating a fused visual-text embedding based on a visual embedding and a text embedding corresponding to the input. The disclosed systems and methods further comprise comparing audio embeddings for music audio sequences of a music audio sequences database with the fused visual-text embedding. The disclosed systems and methods further comprise determining a music audio sequence from the music audio sequences database as the recommended music audio sequence for pairing with the query video sequence based on a similarity metric calculated between an audio embedding for the music audio sequence and the fused visual-text embedding.

3.

发明公开
MULTI-MODAL SOUND EFFECTS RECOMMENDATION 审中-公开

公开(公告)号：US20240220530A1

公开(公告)日：2024-07-04

申请号：US18089710

申请日：2022-12-28

Applicant: ADOBE INC.

Inventor： Julia Lepley WILKINS , Oriol NIETO-CABALLERO , Justin SALAMON

IPC: G06F16/432 , G06V20/40 , G10L15/26

CPC classification number: G06F16/433 , G06F16/434 , G06V20/46 , G10L15/26

Abstract: A sound effects system recommends sound effects using a multi-modal embedding space for projecting visuals, text, and audio. Given an input query comprising a visual (i.e., an image/video) and/or text, an encoder generates a query embedding in the multi-modal embedding space in which sound effects have been projected into sound effect embeddings. A relevant sound effect embedding in the multi-modal space is identified using the query embedding, and a recommendation is provided for a sound effect corresponding to the sound effect embedding.

4.

发明公开
DETECTING AND CLASSIFYING FILLER WORDS IN AUDIO USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20240161735A1

公开(公告)日：2024-05-16

申请号：US18055739

申请日：2022-11-15

Applicant: Adobe Inc.

Inventor： Justin SALAMON , Juan-Pablo CACERES CHOMALI , Ge ZHU , Nicholas J. BRYAN

IPC: G10L15/16 , G10L15/22 , G10L25/78

CPC classification number: G10L15/16 , G10L15/22 , G10L25/78

Abstract: Embodiments are disclosed for performing a filler word detection process on input audio by a media editing system using trained neural networks. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an audio sequence, analyzing the audio sequence to determine filler word candidates, classifying, by a filler word classification model, each filler word candidate of the filler word candidates into one of a set of categories, and generating an output audio sequence, the output audio sequence including an identification of a subset of the filler word candidates in a filler words category of the set of categories as identified filler words.

5.

发明公开
SELF-SUPERVISED AUDIO-VISUAL LEARNING FOR CORRELATING MUSIC AND VIDEO 审中-公开

公开(公告)号：US20230368503A1

公开(公告)日：2023-11-16

申请号：US17742322

申请日：2022-05-11

Applicant: Adobe Inc.

Inventor： Justin SALAMON , Bryan RUSSELL , Didac SURIS COLL-VINENT

IPC: G06V10/774 , G06V20/40 , G06V10/74 , G10L25/57 , G10L25/03

CPC classification number: G06V10/774 , G06V20/49 , G06V20/46 , G06V10/761 , G10L25/57 , G10L25/03

Abstract: Embodiments are disclosed for correlating video sequences and audio sequences by a media recommendation system using a trained encoder network. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training input including a media sequence, including a video sequence paired with an audio sequence, segmenting the media sequence into a set of video sequence segments and a set of audio sequence segments, extracting visual features for each video sequence segment and audio features for each audio sequence segment, generating, by transformer networks, contextualized visual features from the extracted visual features and contextualized audio features from the extracted audio features, the transformer networks including a visual transformer and an audio transformer, generating predicted video and audio sequence segment pairings based on the contextualized visual and audio features, and training the visual transformer and the audio transformer to generate the contextualized visual and audio features.

6.

发明申请
SECTION-BASED MUSIC SIMILARITY SEARCHING 有权

公开(公告)号：US20230129350A1

公开(公告)日：2023-04-27

申请号：US17742318

申请日：2022-05-11

Applicant: Adobe Inc.

Inventor： Nicholas J. BRYAN , Justin SALAMON

IPC: G06F16/683 , G06F16/632 , G06F16/635 , G06F16/64

Abstract: Embodiments are disclosed for performing a section-based, within-song music similarity search by an audio recommendation system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an audio sequence and a request to determine similar audio sequences to the audio sequence from a pre-processed audio catalog, analyzing the audio sequence to generate an audio embedding for the audio sequence, querying a pre-processed audio catalog to retrieve audio embeddings for catalog audio sequences at different time resolutions, generating a set of candidate audio sequences from the pre-processed audio catalog based on the audio embedding for the audio sequence, and providing the set of candidate audio sequences.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification