Initializing a learned latent vector for neural-network projections of diverse images

    公开(公告)号:US11893717B2

    公开(公告)日:2024-02-06

    申请号:US17187080

    申请日:2021-02-26

    Applicant: Adobe Inc.

    Abstract: This disclosure describes one or more embodiments of systems, non-transitory computer-readable media, and methods that can learn or identify a learned-initialization-latent vector for an initialization digital image and reconstruct a target digital image using an image-generating-neural network based on a modified version of the learned-initialization-latent vector. For example, the disclosed systems learn a learned-initialization-latent vector from an initialization image utilizing a high number (e.g., thousands) of learning iterations on an image-generating-neural network (e.g., a GAN). Then, the disclosed systems can modify the learned-initialization-latent vector (of the initialization image) to generate modified or reconstructed versions of target images using the image-generating-neural network. For instance, the disclosed systems utilize the learned-initialization-latent vector as a starting point to learn a learned-latent vector for a target image that an image-generating-neural network converts into a high-fidelity reconstruction of the target image (with a reduced number of learning iterations).

    DOMAIN ALIGNMENT FOR OBJECT DETECTION DOMAIN ADAPTATION TASKS

    公开(公告)号:US20210312232A1

    公开(公告)日:2021-10-07

    申请号:US16885168

    申请日:2020-05-27

    Applicant: Adobe Inc.

    Abstract: A domain alignment technique for cross-domain object detection tasks is introduced. During a preliminary pretraining phase, an object detection model is pretrained to detect objects in images associated with a source domain using a source dataset of images associated with the source domain. After completing the pretraining phase, a domain adaptation phase is performed using the source dataset and a target dataset to adapt the pretrained object detection model to detect objects in images associated with the target domain. The domain adaptation phase may involve the use of various domain alignment modules that, for example, perform multi-scale pixel/path alignment based on input feature maps or perform instance-level alignment based on input region proposals.

    INITIALIZING A LEARNED LATENT VECTOR FOR NEURAL-NETWORK PROJECTIONS OF DIVERSE IMAGES

    公开(公告)号:US20220277431A1

    公开(公告)日:2022-09-01

    申请号:US17187080

    申请日:2021-02-26

    Applicant: Adobe Inc.

    Abstract: This disclosure describes one or more embodiments of systems, non-transitory computer-readable media, and methods that can learn or identify a learned-initialization-latent vector for an initialization digital image and reconstruct a target digital image using an image-generating-neural network based on a modified version of the learned-initialization-latent vector. For example, the disclosed systems learn a learned-initialization-latent vector from an initialization image utilizing a high number (e.g., thousands) of learning iterations on an image-generating-neural network (e.g., a GAN). Then, the disclosed systems can modify the learned-initialization-latent vector (of the initialization image) to generate modified or reconstructed versions of target images using the image-generating-neural network. For instance, the disclosed systems utilize the learned-initialization-latent vector as a starting point to learn a learned-latent vector for a target image that an image-generating-neural network converts into a high-fidelity reconstruction of the target image (with a reduced number of learning iterations).

    UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK

    公开(公告)号:US20230230198A1

    公开(公告)日:2023-07-20

    申请号:US17576091

    申请日:2022-01-14

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.

    ENHANCED DOCUMENT VISUAL QUESTION ANSWERING SYSTEM VIA HIERARCHICAL ATTENTION

    公开(公告)号:US20230153531A1

    公开(公告)日:2023-05-18

    申请号:US17528972

    申请日:2021-11-17

    Applicant: ADOBE INC.

    CPC classification number: G06F40/284 G06F16/24526 G06N3/04

    Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.

    UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK

    公开(公告)号:US20250078200A1

    公开(公告)日:2025-03-06

    申请号:US18952023

    申请日:2024-11-19

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.

    Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback

    公开(公告)号:US12148119B2

    公开(公告)日:2024-11-19

    申请号:US17576091

    申请日:2022-01-14

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.

Patent Agency Ranking