-
公开(公告)号:US12027184B2
公开(公告)日:2024-07-02
申请号:US17661614
申请日:2022-05-02
Applicant: ADOBE INC.
Inventor: Suryateja BV , Prateksha Udhayanan , Parth Satish Laturia , Chauhan Dev Girishchandra , Darshan Khandelwal , Stefano Petrangeli , Balaji Vasan Srinivasan
IPC: G11B27/036 , G06F16/41 , G06F40/20 , G06F40/30 , G06N3/0464 , G06N3/0895 , G06N5/01 , G06V10/764 , G06V10/774 , G10L13/02 , G11B27/031 , G11B27/34 , H04N21/234 , H04N21/8405 , H04N21/845
CPC classification number: G11B27/036 , G06F16/41 , G06F40/20 , G06F40/30 , G06N5/01 , G06V10/764 , G06V10/774 , G10L13/02 , G11B27/34
Abstract: Systems and methods for video processing are configured. Embodiments of the present disclosure receive a procedural document comprising a plurality of instructions; extract a plurality of key concepts for an instruction of the plurality of instructions; compute an information coverage distribution for each of a plurality of candidate multi-media assets, wherein the information coverage distribution indicates whether a corresponding multi-media asset relates to each of the plurality of key concepts; select a set of multi-media assets for the instruction based on the information coverage distribution; and generate a multi-media presentation describing the procedural document by combining the set of multi-media assets based on a presentation template.
-
公开(公告)号:US20250005296A1
公开(公告)日:2025-01-02
申请号:US18342954
申请日:2023-06-28
Applicant: Adobe Inc.
Inventor: Koustava Goswami , Srikrishna Karanam , Joseph Koonthanam Jose , Prateksha Udhayanan , Balaji Vasan Srinivasan
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that implements a vision language machine learning model to generate text representations of an input digital image from localized context tokens. In particular, in some embodiments, the disclosed systems generate image patch feature representations that represent patches from an input image. Further, in some embodiments, the disclosed systems generate localized context tokens from the image patch feature representations and prompt context tokens. Moreover, in some embodiments, by utilizing the localized context tokens, the disclosed systems generate a text representation by utilizing a text encoder of the vision language machine learning model.
-
公开(公告)号:US20250022263A1
公开(公告)日:2025-01-16
申请号:US18351211
申请日:2023-07-12
Applicant: Adobe Inc.
Inventor: Prateksha Udhayanan , Srikrishna Karanam , Balaji Vasan Srinivasan
Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for conditioning images on modification texts to generate multi-modal gradient attention maps. In particular, in some embodiments, the disclosed systems generate, utilizing a vision-language neural network of an image-text comparison machine learning model, a reference text-image feature vector based on a reference image and a modification text. Additionally, in some embodiments, the disclosed systems generate, utilizing the vision-language neural network of the image-text comparison machine learning model, a target text-image feature vector based on a target image and the modification text. Moreover, in some implementations, the disclosed systems generate, from the reference text-image feature vector and the target text-image feature vector, a multi-modal gradient attention map reflecting a visual grounding of the image-text comparison machine learning model relative to the modification text.
-
公开(公告)号:US20230352055A1
公开(公告)日:2023-11-02
申请号:US17661614
申请日:2022-05-02
Applicant: ADOBE INC.
Inventor: Suryateja BV , Prateksha Udhayanan , Parth Satish Laturia , Chauhan Dev Girishchandra , Darshan Khandelwal , Stefano Petrangeli , Balaji Vasan Srinivasan
IPC: G11B27/036 , G11B27/34 , G06F40/20 , G06F40/30 , G06V10/774 , G06V10/764 , G10L13/02 , G06F16/41 , G06N5/00
CPC classification number: G11B27/036 , G11B27/34 , G06F40/20 , G06F40/30 , G06V10/774 , G06V10/764 , G10L13/02 , G06F16/41 , G06N5/003
Abstract: Systems and methods for video processing are configured. Embodiments of the present disclosure receive a procedural document comprising a plurality of instructions; extract a plurality of key concepts for an instruction of the plurality of instructions; compute an information coverage distribution for each of a plurality of candidate multi-media assets, wherein the information coverage distribution indicates whether a corresponding multi-media asset relates to each of the plurality of key concepts; select a set of multi-media assets for the instruction based on the information coverage distribution; and generate a multi-media presentation describing the procedural document by combining the set of multi-media assets based on a presentation template.
-
-
-