Patent search ap:("Adobe Inc.") AND inv:"Fabian David Caba Heilbron" Page 1

1.

发明授权
Face-aware speaker diarization for transcripts and text-based video editing 有权

公开(公告)号：US12125501B2

公开(公告)日：2024-10-22

申请号：US17967399

申请日：2022-10-17

Applicant: Adobe Inc.

Inventor： Fabian David Caba Heilbron , Xue Bai , Aseem Omprakash Agarwala , Haoran Cai , Lubomira Assenova Dontcheva

IPC: G11B27/031 , G06V20/40

CPC classification number: G11B27/031 , G06V20/41

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.

2.

发明申请
TEMPORALLY DISTRIBUTED NEURAL NETWORKS FOR VIDEO SEMANTIC SEGMENTATION 有权

公开(公告)号：US20210319232A1

公开(公告)日：2021-10-14

申请号：US16846544

申请日：2020-04-13

Applicant: Adobe Inc

Inventor： Federico Perazzi , Zhe Lin , Ping Hu , Oliver Wang , Fabian David Caba Heilbron

IPC: G06K9/00 , G06N3/04 , G06F17/15 , G06T7/11

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

3.

发明授权
Retiming digital videos utilizing deep learning 有权

公开(公告)号：US12112771B2

公开(公告)日：2024-10-08

申请号：US18185137

申请日：2023-03-16

Applicant: Adobe Inc.

Inventor： Simon Jenni , Markus Woodson , Fabian David Caba Heilbron

IPC: G11B27/00 , H04N21/234 , H04N21/2343 , H04N21/24

CPC classification number: G11B27/005 , H04N21/23418 , H04N21/234381 , H04N21/2402

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that generate a temporally remapped video that satisfies a desired target duration while preserving natural video dynamics. In certain instances, the disclosed systems utilize a playback speed prediction machine-learning model that recognizes and localizes temporally varying changes in video playback speed to re-time a digital video with varying frame-change speeds. For instance, to re-time the digital video, the disclosed systems utilize the playback speed prediction machine-learning model to infer the slowness of individual video frames. Subsequently, in certain embodiments, the disclosed systems determine, from frames of a digital video, a temporal frame sub-sampling that is consistent with the slowness predictions and fit within a target video duration. In certain implementations, the disclosed systems utilize the temporal frame sub-sampling to generate a speed varying digital video that preserves natural video dynamics while fitting the target video duration.

4.

发明授权
Temporally distributed neural networks for video semantic segmentation 有权

公开(公告)号：US11354906B2

公开(公告)日：2022-06-07

申请号：US16846544

申请日：2020-04-13

Applicant: Adobe Inc.

Inventor： Federico Perazzi , Zhe Lin , Ping Hu , Oliver Wang , Fabian David Caba Heilbron

IPC: G06V20/40 , G06N3/04 , G06T7/11 , G06F17/15

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

5.

发明授权
Active learning method for temporal action localization in untrimmed videos 有权

公开(公告)号：US10726313B2

公开(公告)日：2020-07-28

申请号：US15957419

申请日：2018-04-19

Applicant: Adobe Inc.

Inventor： Joon-Young Lee , Hailin Jin , Fabian David Caba Heilbron

IPC: G06K9/00 , G06K9/66 , G06N3/08 , G06K9/62 , G06N3/04

Abstract: Various embodiments describe active learning methods for training temporal action localization models used to localize actions in untrimmed videos. A trainable active learning selection function is used to select unlabeled samples that can improve the temporal action localization model the most. The select unlabeled samples are then annotated and used to retrain the temporal action localization model. In some embodiment, the trainable active learning selection function includes a trainable performance prediction model that maps a video sample and a temporal action localization model to a predicted performance improvement for the temporal action localization model.

6.

发明授权
Music-aware speaker diarization for transcripts and text-based video editing 有权

公开(公告)号：US12223962B2

公开(公告)日：2025-02-11

申请号：US17967502

申请日：2022-10-17

Applicant: Adobe Inc.

Inventor： Justin Jonathan Salamon , Fabian David Caba Heilbron , Xue Bai , Aseem Omprakash Agarwala , Hijung Shin , Lubomira Assenova Dontcheva

IPC: G10L15/08 , G10L15/26 , G11B27/031

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for music-aware speaker diarization. In an example embodiment, one or more audio classifiers detect speech and music independently of each other, which facilitates detecting regions in an audio track that contain music but do not contain speech. These music-only regions are compared to the transcript, and any transcription and speakers that overlap in time with the music-only regions are removed from the transcript. In some embodiments, rather than having the transcript display the text from this detected music, a visual representation of the audio waveform is included in the corresponding regions of the transcript.

7.

发明公开
RETIMING DIGITAL VIDEOS UTILIZING DEEP LEARNING 审中-公开

公开(公告)号：US20230276084A1

公开(公告)日：2023-08-31

申请号：US18185137

申请日：2023-03-16

Applicant: Adobe Inc.

Inventor： Simon Jenni , Markus Woodson , Fabian David Caba Heilbron

IPC: H04N21/2343 , H04N21/234 , H04N21/24

CPC classification number: H04N21/234381 , H04N21/23418 , H04N21/2402

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that generate a temporally remapped video that satisfies a desired target duration while preserving natural video dynamics. In certain instances, the disclosed systems utilize a playback speed prediction machine-learning model that recognizes and localizes temporally varying changes in video playback speed to re-time a digital video with varying frame-change speeds. For instance, to re-time the digital video, the disclosed systems utilize the playback speed prediction machine-learning model to infer the slowness of individual video frames. Subsequently, in certain embodiments, the disclosed systems determine, from frames of a digital video, a temporal frame sub-sampling that is consistent with the slowness predictions and fit within a target video duration. In certain implementations, the disclosed systems utilize the temporal frame sub-sampling to generate a speed varying digital video that preserves natural video dynamics while fitting the target video duration.

8.

发明授权
Retiming digital videos utilizing machine learning and temporally varying speeds 有权

公开(公告)号：US11610606B1

公开(公告)日：2023-03-21

申请号：US17652586

申请日：2022-02-25

Applicant: Adobe Inc.

Inventor： Simon Jenni , Markus Woodson , Fabian David Caba Heilbron

IPC: H04N5/783 , G11B27/00

Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that generate a temporally remapped video that satisfies a desired target duration while preserving natural video dynamics. In certain instances, the disclosed systems utilize a playback speed prediction machine-learning model that recognizes and localizes temporally varying changes in video playback speed to re-time a digital video with varying frame-change speeds. For instance, to re-time the digital video, the disclosed systems utilize the playback speed prediction machine-learning model to infer the slowness of individual video frames. Subsequently, in certain embodiments, the disclosed systems determine, from frames of a digital video, a temporal frame sub-sampling that is consistent with the slowness predictions and fit within a target video duration. In certain implementations, the disclosed systems utilize the temporal frame sub-sampling to generate a speed varying digital video that preserves natural video dynamics while fitting the target video duration.

9.

发明申请
Learning to Personalize Vision-Language Models through Meta-Personalization 有权

公开(公告)号：US20240419726A1

公开(公告)日：2024-12-19

申请号：US18210535

申请日：2023-06-15

Applicant: Adobe Inc.

Inventor： Simon Jenni , Fabian David Caba Heilbron , Chun-Hsiao Yeh , Bryan Russell , Josef Sivic

IPC: G06F16/58 , G06F16/535 , G06F16/538

Abstract: Techniques for learning to personalize vision-language models through meta-personalization are described. In one embodiment, one or more processing devices lock a pre-trained vision-language model (VLM) during a training phase. The processing devices train the pre-trained VLM to augment a text encoder of the pre-trained VLM with a set of general named video instances to form a meta-personalized VLM, the meta-personalized VLM to include global category features. The processing devices test the meta-personalized VLM to adapt the text encoder with a set of personal named video instances to form a personal VLM, the personal VLM comprising the global category features personalized with a set of personal instance weights to form a personal instance token associated with the user. Other embodiments are described and claimed.

10.

发明授权
Temporally distributed neural networks for video semantic segmentation 有权

公开(公告)号：US11854206B2

公开(公告)日：2023-12-26

申请号：US17735156

申请日：2022-05-03

Applicant: Adobe Inc.

Inventor： Federico Perazzi , Zhe Lin , Ping Hu , Oliver Wang , Fabian David Caba Heilbron

IPC: G06T7/11 , G06F17/15 , G06V20/40 , G06N3/045 , G06V10/80

CPC classification number: G06T7/11 , G06F17/15 , G06N3/045 , G06V10/806 , G06V20/46 , G06V20/49 , G06T2207/10016 , G06T2207/20084

Abstract: A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification