-
公开(公告)号:US20230139927A1
公开(公告)日:2023-05-04
申请号:US18148256
申请日:2022-12-29
Applicant: Adobe Inc.
Inventor: Mayank SINGH , Balaji Krishnamurthy , Nupur KUMARI , Puneet MANGLA
IPC: G06T7/11 , G06N3/08 , G06N3/04 , G06F18/214 , G06F18/21 , G06V10/774 , G06V10/82
Abstract: Embodiments are disclosed for training a neural network classifier to learn to more closely align an input image with its attribution map. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training image comprising a representation of one or more objects, the training image associated with at least one label for the representation of the one or more objects, generating a perturbed training image based on the training image using a neural network, and training the neural network using the perturbed training image by minimizing a combination of classification loss and attribution loss to learn to align an image with its corresponding attribution map.
-
公开(公告)号:US20240153258A1
公开(公告)日:2024-05-09
申请号:US17976541
申请日:2022-10-28
Applicant: ADOBE INC.
Inventor: Puneet MANGLA , Milan AGGARWAL , Balaji KRISHNAMURTHY
IPC: G06V10/80 , G06F40/40 , G06V10/764 , G06V10/77 , G06V10/774 , G06V10/82 , G06V10/86
CPC classification number: G06V10/811 , G06F40/40 , G06V10/764 , G06V10/7715 , G06V10/774 , G06V10/82 , G06V10/86
Abstract: Various embodiments classify one or more portions of an image based on deriving an “intrinsic” modality. Such intrinsic modality acts as a substitute to a “text” modality in a multi-modal network. A text modality in image processing is typically a natural language text that describes one or more portions of an image. However, explicit natural language text may not be available across one or more domains for training a multi-modal network. Accordingly, various embodiments described herein generate an intrinsic modality, which is also a description of one or more portions of an image, except that such description is not an explicit natural language description, but rather a machine learning model representation. Some embodiments additionally leverage a visual modality obtained from a vision-only model or branch, which may learn domain characteristics that are not present in the multi-modal network. Some embodiments additionally fuse or integrate the intrinsic modality with the visual modality for better generalization.
-
公开(公告)号:US20220012530A1
公开(公告)日:2022-01-13
申请号:US16926511
申请日:2020-07-10
Applicant: Adobe Inc.
Inventor: Mayank SINGH , Balaji Krishnamurthy , Nupur KUMARI , Puneet MANGLA
Abstract: Embodiments are disclosed for training a neural network classifier to learn to more closely align an input image with its attribution map. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training image comprising a representation of one or more objects, the training image associated with at least one label for the representation of the one or more objects, generating a perturbed training image based on the training image using a neural network, and training the neural network using the perturbed training image by minimizing a combination of classification loss and attribution loss to learn to align an image with its corresponding attribution map.
-
-