-
公开(公告)号:US20240355022A1
公开(公告)日:2024-10-24
申请号:US18476504
申请日:2023-09-28
Applicant: ADOBE INC.
Inventor: Jing Shi , Wei Xiong , Zhe Lin , Hyun Joon Jung
CPC classification number: G06T11/60 , G06T7/194 , G06T9/00 , G06T2207/20081
Abstract: One or more aspects of a method, apparatus, and non-transitory computer readable medium include obtaining an input description and an input image depicting a subject, encoding the input description using a text encoder of an image generation model to obtain a text embedding, and encoding the input image using a subject encoder of the image generation model to obtain a subject embedding. A guidance embedding is generated by combining the subject embedding and the text embedding, and then an output image is generated based on the guidance embedding using a diffusion model of the image generation model. The output image depicts aspects of the subject and the input description.
-
公开(公告)号:US20250131753A1
公开(公告)日:2025-04-24
申请号:US18489681
申请日:2023-10-18
Applicant: Adobe Inc. , University of Surrey
Inventor: Yifei Fan , John Collomosse , Jing Shi , Alexander Black
IPC: G06V20/70 , G06V10/44 , G06V10/771 , G06V10/82
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating difference captions indicating detected differences in digital image pairs. The disclosed system generates a first feature map of a first digital image and a second feature map of a second digital image. The disclosed system converts, utilizing a linear projection neural network, the first feature map to a first modified feature map in a feature space corresponding to a large language machine-learning model. The disclosed system also converts, utilizing the linear projection neural network layer, the second feature map to a second modified feature map in the feature space corresponding to the large language machine-learning model. The disclosed system further generates, utilizing the large language machine-learning model, a difference caption indicating a difference between the first digital image and the second digital image from a combination of the first modified feature map and the second modified feature map.
-
3.
公开(公告)号:US20250046055A1
公开(公告)日:2025-02-06
申请号:US18363980
申请日:2023-08-02
Applicant: Adobe Inc.
Inventor: Zhifei Zhang , Zhe Lin , Yixuan Ren , Yifei Fan , Jing Shi
Abstract: This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that trains (and utilizes) an image color editing diffusion neural network to generate a color edited digital image(s) for a digital image. In particular, in one or more implementations, the disclosed systems identify a digital image depicting content in a first color style. Moreover, the disclosed systems generate, from the digital image utilizing an image color editing diffusion neural network, a color-edited digital image depicting the content in a second color style different from the first color style. Further, the disclosed systems provide, for display within a graphical user interface, the color-edited digital image.
-
4.
公开(公告)号:US11570318B2
公开(公告)日:2023-01-31
申请号:US17374103
申请日:2021-07-13
Applicant: Adobe Inc.
Inventor: Ning Xu , Jing Shi , Franck Dernoncourt , Trung Bui
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize a neural network having a long short-term memory encoder-decoder architecture to progressively modify a digital image in accordance with a natural language request. For example, in one or more embodiments, the disclosed systems utilize a language-to-operation decoding cell of a language-to-operation neural network to sequentially determine one or more image-modification operations to perform to modify a digital image in accordance with a natural language request. In some cases, the decoding cell determines an image-modification operation to perform partly based on the previously used image-modification operations. The disclosed systems further utilize the decoding cell to determine one or more operation parameters for each selected image-modification operation. The disclosed systems utilize the image-modification operation(s) and operation parameter(s) to modify the digital image (e.g., by generating one or more modified digital images) via the decoding cell.
-
公开(公告)号:US20220067992A1
公开(公告)日:2022-03-03
申请号:US17007693
申请日:2020-08-31
Applicant: Adobe Inc.
Inventor: Ning XU , Trung Bui , Jing Shi , Franck Dernoncourt
Abstract: This disclosure involves executing artificial intelligence models that infer image editing operations from natural language requests spoken by a user. Further, this disclosure performs the inferred image editing operations using inferred parameters for the image editing operations. Systems and methods may be provided that infer one or more image editing operations from a natural language request associated with a source image, locate areas of the source that are relevant to the one or more image editing operations to generate image masks, and performing the one or more image editing operations to generate a modified source image.
-
6.
公开(公告)号:US11989923B2
公开(公告)日:2024-05-21
申请号:US17483126
申请日:2021-09-23
Applicant: Adobe Inc.
IPC: G06K9/46 , G06F18/21 , G06F18/22 , G06F40/205 , G06K9/62 , G06N3/02 , G06V10/426
CPC classification number: G06V10/426 , G06F18/217 , G06F18/22 , G06F40/205 , G06N3/02
Abstract: This disclosure describes methods, non-transitory computer readable storage media, and systems that utilize weakly supervised graph matching to align an ungrounded label graph and a visual graph corresponding to a digital image. Specifically, the disclosed system utilizes a label embedding model to generate label graph embeddings from the ungrounded label graph and a visual embedding network to generate visual graph embeddings from the visual graph. Additionally, the disclosed system determines similarity metrics indicating the similarity of pairs of label graph embeddings and visual graph embeddings. The disclosed system then generates a semantic scene graph by utilizing a graph matching algorithm to align the ungrounded label graph and the visual graph based on the similarity metrics. In some embodiments, the disclosed system utilizes contrastive learning to modify the embedding models. Furthermore, in additional embodiments, the disclosed system utilizes the semantic scene graph to train a scene graph generation neural network.
-
公开(公告)号:US11670023B2
公开(公告)日:2023-06-06
申请号:US17007693
申请日:2020-08-31
Applicant: Adobe Inc.
Inventor: Ning Xu , Trung Bui , Jing Shi , Franck Dernoncourt
CPC classification number: G06T11/60 , G10L15/16 , G10L15/22 , G10L2015/223
Abstract: This disclosure involves executing artificial intelligence models that infer image editing operations from natural language requests spoken by a user. Further, this disclosure performs the inferred image editing operations using inferred parameters for the image editing operations. Systems and methods may be provided that infer one or more image editing operations from a natural language request associated with a source image, locate areas of the source that are relevant to the one or more image editing operations to generate image masks, and performing the one or more image editing operations to generate a modified source image.
-
8.
公开(公告)号:US20220399017A1
公开(公告)日:2022-12-15
申请号:US17374103
申请日:2021-07-13
Applicant: Adobe Inc.
Inventor: Ning Xu , Jing Shi , Franck Dernoncourt , Trung Bui
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize a neural network having a long short-term memory encoder-decoder architecture to progressively modify a digital image in accordance with a natural language request. For example, in one or more embodiments, the disclosed systems utilize a language-to-operation decoding cell of a language-to-operation neural network to sequentially determine one or more image-modification operations to perform to modify a digital image in accordance with a natural language request. In some cases, the decoding cell determines an image-modification operation to perform partly based on the previously used image-modification operations. The disclosed systems further utilize the decoding cell to determine one or more operation parameters for each selected image-modification operation. The disclosed systems utilize the image-modification operation(s) and operation parameter(s) to modify the digital image (e.g., by generating one or more modified digital images) via the decoding cell.
-
-
-
-
-
-
-