Patent search ap:("Adobe Inc.") AND inv:"Oriol NIETO-CABALLERO" Page 1

1.

发明申请
MULTI-LEVEL AUDIO SEGMENTATION USING DEEP EMBEDDINGS 有权

公开(公告)号：US20230115212A1

公开(公告)日：2023-04-13

申请号：US17742313

申请日：2022-05-11

Applicant: Adobe Inc.

Inventor： Justin SALAMON , Oriol NIETO-CABALLERO , Nicholas J. BRYAN

IPC: G10H1/00

Abstract: Embodiments are disclosed for generating an audio segmentation of an audio sequence using deep embeddings. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an audio sequence and extracting features for each frame of the audio sequence, where each frame is associated with a beat of the audio sequence. The method may further comprise clustering frames of the audio sequence into one or more clusters based on the extracted features and generating segments of the audio sequence based on the clustered frames, where each segment includes frames of the audio sequence from a same cluster. The method may further comprise constructing a multi-level audio segmentation of the audio sequence and performing a segment fusioning process that merges shorter segments with neighboring segments based on cluster assignments.

2.

发明公开
SPOKEN LANGUAGE RECOGNITION 审中-公开

公开(公告)号：US20240257798A1

公开(公告)日：2024-08-01

申请号：US18104434

申请日：2023-02-01

Applicant: ADOBE INC.

Inventor： Oriol NIETO-CABALLERO , Zeyu JIN , Justin Jonathan SALAMON , Franck DERNONCOURT

IPC: G10L15/00 , G10L25/30

CPC classification number: G10L15/005 , G10L25/30

Abstract: Some aspects of the technology described herein employ a neural network with an efficient and lightweight architecture to perform spoken language recognition. Given an audio signal comprising speech, features are generated from the audio signal, for instance, by converting the audio signal to a normalized spectrogram. The features are input to the neural network, which has one or more convolutional layers and an output activation layer. Each neuron of the output activation layer corresponds to a language from a set of language and generates an activation value. Based on the activations values, an indication of zero or more languages from the set of languages is provided for the audio signal.

3.

发明公开
MULTI-MODAL SOUND EFFECTS RECOMMENDATION 审中-公开

公开(公告)号：US20240220530A1

公开(公告)日：2024-07-04

申请号：US18089710

申请日：2022-12-28

Applicant: ADOBE INC.

Inventor： Julia Lepley WILKINS , Oriol NIETO-CABALLERO , Justin SALAMON

IPC: G06F16/432 , G06V20/40 , G10L15/26

CPC classification number: G06F16/433 , G06F16/434 , G06V20/46 , G10L15/26

Abstract: A sound effects system recommends sound effects using a multi-modal embedding space for projecting visuals, text, and audio. Given an input query comprising a visual (i.e., an image/video) and/or text, an encoder generates a query embedding in the multi-modal embedding space in which sound effects have been projected into sound effect embeddings. A relevant sound effect embedding in the multi-modal space is identified using the query embedding, and a recommendation is provided for a sound effect corresponding to the sound effect embedding.

Patent Agency Ranking