TEXT-TO-IMAGE SYNTHESIS UTILIZING DIFFUSION MODELS WITH TEST-TIME ATTENTION SEGREGATION AND RETENTION OPTIMIZATION

    公开(公告)号:US20240428468A1

    公开(公告)日:2024-12-26

    申请号:US18337634

    申请日:2023-06-20

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilizes attention segregation loss and/or attention retention loss at inference time of a diffusion neural network to generate a text-conditioned image. In particular, in some embodiments, the disclosed systems utilize the attention segregation loss to reduce overlap between concepts by comparing attention maps for multiple concepts of a text query corresponding to a denoising step. Further, in some embodiments, the disclosed systems utilize the attention retention loss to improve information retention for concepts across denoising steps by comparing attention maps between different denoising steps. Accordingly, in some embodiments, by utilizing the attention segregation loss and the attention retention loss, the disclosed systems accurately maintain multiple concepts from a text query when generating a text-conditioned image.

    TEXT-CONDITIONED VISUAL ATTENTION FOR MULTIMODAL MACHINE LEARNING MODELS

    公开(公告)号:US20250022263A1

    公开(公告)日:2025-01-16

    申请号:US18351211

    申请日:2023-07-12

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for conditioning images on modification texts to generate multi-modal gradient attention maps. In particular, in some embodiments, the disclosed systems generate, utilizing a vision-language neural network of an image-text comparison machine learning model, a reference text-image feature vector based on a reference image and a modification text. Additionally, in some embodiments, the disclosed systems generate, utilizing the vision-language neural network of the image-text comparison machine learning model, a target text-image feature vector based on a target image and the modification text. Moreover, in some implementations, the disclosed systems generate, from the reference text-image feature vector and the target text-image feature vector, a multi-modal gradient attention map reflecting a visual grounding of the image-text comparison machine learning model relative to the modification text.

    Cross view template recommendation system

    公开(公告)号:US12013883B1

    公开(公告)日:2024-06-18

    申请号:US18200856

    申请日:2023-05-23

    Applicant: Adobe Inc.

    Abstract: An illustrator system determines, for each feature of a set of features, a feature representation for an electronic document displayed via a user interface, based on a plurality of elements of the electronic document. The system receives a selection from among the set of features of (1) a query feature and of (2) a target feature and determines, for each replacement template of a set of replacement templates, a compatibility score based on the feature representation for the electronic document determined for the query feature and a target feature representation of the replacement template determined for the target feature, the representations being determined in a joint representation space. The system selects one or more replacement electronic documents based on the determined compatibility scores. The system displays a preview for each replacement electronic document and displays a particular replacement electronic document responsive to receiving a selection of a preview.

    VISUAL GROUNDING OF SELF-SUPERVISED REPRESENTATIONS FOR MACHINE LEARNING MODELS UTILIZING DIFFERENCE ATTENTION

    公开(公告)号:US20240420447A1

    公开(公告)日:2024-12-19

    申请号:US18336423

    申请日:2023-06-16

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing difference attention to evaluate and/or train machine learning models. In particular, in some embodiments, the disclosed systems generate, utilizing a machine learning model, a first feature vector from a digital image. In one or more implementations, the disclosed systems generate a masked digital image by masking a region from the digital image. Additionally, in some embodiments, the disclosed systems generate, utilizing the machine learning model, a second feature vector from the masked digital image. Moreover, in some implementations, the disclosed systems determine a difference feature vector between the first feature vector and the second feature vector. Furthermore, in some embodiments, the disclosed systems generate, from the difference feature vector, a difference attention map reflecting a visual grounding of the machine learning model relative to the region.

Patent Agency Ranking