USING INTRINSIC MULTIMODAL FEATURES OF IMAGE FOR DOMAIN GENERALIZED

    公开(公告)号:US20240153258A1

    公开(公告)日:2024-05-09

    申请号:US17976541

    申请日:2022-10-28

    Applicant: ADOBE INC.

    Abstract: Various embodiments classify one or more portions of an image based on deriving an “intrinsic” modality. Such intrinsic modality acts as a substitute to a “text” modality in a multi-modal network. A text modality in image processing is typically a natural language text that describes one or more portions of an image. However, explicit natural language text may not be available across one or more domains for training a multi-modal network. Accordingly, various embodiments described herein generate an intrinsic modality, which is also a description of one or more portions of an image, except that such description is not an explicit natural language description, but rather a machine learning model representation. Some embodiments additionally leverage a visual modality obtained from a vision-only model or branch, which may learn domain characteristics that are not present in the multi-modal network. Some embodiments additionally fuse or integrate the intrinsic modality with the visual modality for better generalization.

    ATTRIBUTIONALLY ROBUST TRAINING FOR WEAKLY SUPERVISED LOCALIZATION AND SEGMENTATION

    公开(公告)号:US20220012530A1

    公开(公告)日:2022-01-13

    申请号:US16926511

    申请日:2020-07-10

    Applicant: Adobe Inc.

    Abstract: Embodiments are disclosed for training a neural network classifier to learn to more closely align an input image with its attribution map. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training image comprising a representation of one or more objects, the training image associated with at least one label for the representation of the one or more objects, generating a perturbed training image based on the training image using a neural network, and training the neural network using the perturbed training image by minimizing a combination of classification loss and attribution loss to learn to align an image with its corresponding attribution map.

Patent Agency Ranking