-
公开(公告)号:US20240428468A1
公开(公告)日:2024-12-26
申请号:US18337634
申请日:2023-06-20
Applicant: Adobe Inc.
Inventor: Aishwarya Agarwal , Srikrishna Karanam , Joseph Koonthanam Jose , Apoorv Umang Saxena , Koustava Goswami , Balaji Vasan Srinivasan
IPC: G06T11/00 , G06N3/0455
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilizes attention segregation loss and/or attention retention loss at inference time of a diffusion neural network to generate a text-conditioned image. In particular, in some embodiments, the disclosed systems utilize the attention segregation loss to reduce overlap between concepts by comparing attention maps for multiple concepts of a text query corresponding to a denoising step. Further, in some embodiments, the disclosed systems utilize the attention retention loss to improve information retention for concepts across denoising steps by comparing attention maps between different denoising steps. Accordingly, in some embodiments, by utilizing the attention segregation loss and the attention retention loss, the disclosed systems accurately maintain multiple concepts from a text query when generating a text-conditioned image.
-
公开(公告)号:US20250022263A1
公开(公告)日:2025-01-16
申请号:US18351211
申请日:2023-07-12
Applicant: Adobe Inc.
Inventor: Prateksha Udhayanan , Srikrishna Karanam , Balaji Vasan Srinivasan
Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for conditioning images on modification texts to generate multi-modal gradient attention maps. In particular, in some embodiments, the disclosed systems generate, utilizing a vision-language neural network of an image-text comparison machine learning model, a reference text-image feature vector based on a reference image and a modification text. Additionally, in some embodiments, the disclosed systems generate, utilizing the vision-language neural network of the image-text comparison machine learning model, a target text-image feature vector based on a target image and the modification text. Moreover, in some implementations, the disclosed systems generate, from the reference text-image feature vector and the target text-image feature vector, a multi-modal gradient attention map reflecting a visual grounding of the image-text comparison machine learning model relative to the modification text.
-
公开(公告)号:US12013883B1
公开(公告)日:2024-06-18
申请号:US18200856
申请日:2023-05-23
Applicant: Adobe Inc.
Inventor: Tripti Shukla , Vishwa Vinay , Srikrishna Karanam , Praneetha Vaddamanu , Balaji Vasan Srinivasan
IPC: G06F3/0484 , G06F16/31 , G06F16/332 , G06F40/106 , G06F40/109 , G06F40/186
CPC classification number: G06F16/3323 , G06F16/31 , G06F40/106 , G06F40/109 , G06F40/186
Abstract: An illustrator system determines, for each feature of a set of features, a feature representation for an electronic document displayed via a user interface, based on a plurality of elements of the electronic document. The system receives a selection from among the set of features of (1) a query feature and of (2) a target feature and determines, for each replacement template of a set of replacement templates, a compatibility score based on the feature representation for the electronic document determined for the query feature and a target feature representation of the replacement template determined for the target feature, the representations being determined in a joint representation space. The system selects one or more replacement electronic documents based on the determined compatibility scores. The system displays a preview for each replacement electronic document and displays a particular replacement electronic document responsive to receiving a selection of a preview.
-
公开(公告)号:US20240420447A1
公开(公告)日:2024-12-19
申请号:US18336423
申请日:2023-06-16
Applicant: Adobe Inc.
Inventor: Aishwarya Agarwal , Srikrishna Karanam , Balaji Vasan Srinivasan
Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing difference attention to evaluate and/or train machine learning models. In particular, in some embodiments, the disclosed systems generate, utilizing a machine learning model, a first feature vector from a digital image. In one or more implementations, the disclosed systems generate a masked digital image by masking a region from the digital image. Additionally, in some embodiments, the disclosed systems generate, utilizing the machine learning model, a second feature vector from the masked digital image. Moreover, in some implementations, the disclosed systems determine a difference feature vector between the first feature vector and the second feature vector. Furthermore, in some embodiments, the disclosed systems generate, from the difference feature vector, a difference attention map reflecting a visual grounding of the machine learning model relative to the region.
-
公开(公告)号:US20250005296A1
公开(公告)日:2025-01-02
申请号:US18342954
申请日:2023-06-28
Applicant: Adobe Inc.
Inventor: Koustava Goswami , Srikrishna Karanam , Joseph Koonthanam Jose , Prateksha Udhayanan , Balaji Vasan Srinivasan
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that implements a vision language machine learning model to generate text representations of an input digital image from localized context tokens. In particular, in some embodiments, the disclosed systems generate image patch feature representations that represent patches from an input image. Further, in some embodiments, the disclosed systems generate localized context tokens from the image patch feature representations and prompt context tokens. Moreover, in some embodiments, by utilizing the localized context tokens, the disclosed systems generate a text representation by utilizing a text encoder of the vision language machine learning model.
-
公开(公告)号:US20240394942A1
公开(公告)日:2024-11-28
申请号:US18323029
申请日:2023-05-24
Applicant: Adobe Inc.
Inventor: Anant Shankhdhar , Samyak Sanjay Mehta , Shreya Singh , K. V. Vikram , Tripti Shukla , Srikrishna Karanam , Balaji Vasan Srinivasan , Vishwa Vinay , Niyati Himanshu Chhaya
IPC: G06T11/60 , G06F16/58 , G06F40/211 , G06V30/418
Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for expanding a digital document including a sequence of informational data via supplemental multimodal digital content. In particular, the system expands digital documents with multimodal granular details to dynamically integrate supplemental in-depth information to the digital document. For example, in response to a selection of a specific portion of a digital document, the system generates expanded multimodal content (e.g., text and image content) for the selected portion of the digital document from external text and image sources. Indeed, the system uses existing content from the digital document to select images and combine the selected images with text into image-text pairs that are textually and visually consistent with the digital document. Moreover, the system expands the digital document by inserting the image-text pairs in connection with the selected portion of the digital document.
-
-
-
-
-