Patent search ap:("ADOBE INC.") AND inv:"Reuben Xin Hong Tan" Page 1

1.

发明申请
LOCALIZATION OF NARRATIONS IN IMAGE DATA 有权

公开(公告)号：US20230115551A1

公开(公告)日：2023-04-13

申请号：US17499193

申请日：2021-10-12

Applicant: ADOBE INC.

Inventor： Hailin Jin , Bryan Russell , Reuben Xin Hong Tan

IPC: G06K9/00 , G06K9/62 , G10L15/26 , G10L15/19 , G10L15/16 , G10L15/02 , G06N3/04

Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.

2.

发明授权
Localization of narrations in image data 有权

公开(公告)号：US12118787B2

公开(公告)日：2024-10-15

申请号：US17499193

申请日：2021-10-12

Applicant: ADOBE INC.

Inventor： Hailin Jin , Bryan Russell , Reuben Xin Hong Tan

IPC: G06K9/00 , G06F18/214 , G06F18/22 , G06N3/04 , G06V20/40 , G10L15/02 , G10L15/16 , G10L15/19 , G10L15/26

CPC classification number: G06V20/41 , G06F18/214 , G06F18/22 , G06N3/04 , G06V20/46 , G10L15/02 , G10L15/16 , G10L15/19 , G10L15/26

Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.

Patent Agency Ranking