-
公开(公告)号:US10769495B2
公开(公告)日:2020-09-08
申请号:US16052246
申请日:2018-08-01
Applicant: Adobe Inc.
Inventor: Trung Huu Bui , Zhe Lin , Walter Wei-Tuh Chang , Nham Van Le , Franck Dernoncourt
IPC: G06K9/62 , G06F3/16 , G06F3/0488 , G10L15/06 , G06F9/451 , G06F3/0482 , G06F16/54 , G06N3/08 , G06N20/00 , G06F3/0484
Abstract: In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.
-
公开(公告)号:US20200241574A1
公开(公告)日:2020-07-30
申请号:US16262448
申请日:2019-01-30
Applicant: Adobe Inc.
Inventor: Zhe Lin , Xin Ye , Joon-Young Lee , Jianming Zhang
Abstract: Systems and techniques are described that provide for generalizable approach policy learning and implementation for robotic object approaching. Described techniques provide fast and accurate approaching of a specified object, or type of object, in many different environments. The described techniques enable a robot to receive an identification of an object or type of object from a user, and then navigate to the desired object, without further control from the user. Moreover, the approach of the robot to the desired object is performed efficiently, e.g., with a minimum number of movements. Further, the approach techniques may be used even when the robot is placed in a new environment, such as when the same type of object must be approached in multiple settings.
-
公开(公告)号:US20200184610A1
公开(公告)日:2020-06-11
申请号:US16791939
申请日:2020-02-14
Applicant: Adobe Inc.
Inventor: Zhe Lin , Xin Lu , Xiaohui Shen , Jimei Yang , Jiahui Yu
Abstract: Digital image completion using deep learning is described. Initially, a digital image having at least one hole is received. This holey digital image is provided as input to an image completer formed with a framework that combines generative and discriminative neural networks based on learning architecture of the generative adversarial networks. From the holey digital image, the generative neural network generates a filled digital image having hole-filling content in place of holes. The discriminative neural networks detect whether the filled digital image and the hole-filling digital content correspond to or include computer-generated content or are photo-realistic. The generating and detecting are iteratively continued until the discriminative neural networks fail to detect computer-generated content for the filled digital image and hole-filling content or until detection surpasses a threshold difficulty. Responsive to this, the image completer outputs the filled digital image with hole-filling content in place of the holey digital image's holes.
-
公开(公告)号:US20200151448A1
公开(公告)日:2020-05-14
申请号:US16189805
申请日:2018-11-13
Applicant: Adobe Inc.
Inventor: Zhe Lin , Xiaohui Shen , Mingyang Ling , Jianming Zhang , Jason Wen Yong Kuen
Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.
-
公开(公告)号:US20200118253A1
公开(公告)日:2020-04-16
申请号:US16188479
申请日:2018-11-13
Applicant: Adobe Inc.
Inventor: Jonathan Eisenmann , Zhe Lin , Matthew Fisher
Abstract: In some embodiments, an image manipulation application receives a two-dimensional background image and projects the background image onto a sphere to generate a sphere image. Based on the sphere image, an unfilled environment map containing a hole area lacking image content can be generated. A portion of the unfilled environment map can be projected to an unfilled projection image using a map projection. The unfilled projection image contains the hole area. A hole filling model is applied to the unfilled projection image to generate a filled projection image containing image content for the hole area. A filled environment map can be generated by applying an inverse projection of the map projection on the filled projection image and by combining the unfilled environment map with the generated image content for the hole area of the environment map.
-
公开(公告)号:US20200042286A1
公开(公告)日:2020-02-06
申请号:US16052246
申请日:2018-08-01
Applicant: Adobe Inc.
Inventor: Trung Huu Bui , Zhe Lin , Walter Wei-Tuh Chang , Nham Van Le , Franck Dernoncourt
IPC: G06F3/16 , G10L15/26 , G06F3/0488 , G06F3/0482 , G10L15/06 , G06F17/30 , G06F9/451
Abstract: In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.
-
公开(公告)号:US20200020108A1
公开(公告)日:2020-01-16
申请号:US16035410
申请日:2018-07-13
Applicant: Adobe Inc.
Inventor: I-Ming Pao , Zhe Lin
Abstract: A digital medium environment is described to automatically generate a trimap and segment a digital image, independent of any user intervention. An image processing system receives an image and a low-resolution mask for the image, which provides a probability map indicating a likelihood that a pixel in the image mask corresponds to a foreground object in the image. The image processing system analyzes the image to identify content in the image's foreground and background portions, and adaptively generates a trimap for the image based on differences between the identified foreground and background content. By identifying content of the image prior to generating the trimap, the techniques described herein can be applied to a wide range of images, such as images where foreground content is visually similar to background content, and vice versa. Thus, the image processing system can automatically generate trimaps for images having diverse visual characteristics.
-
公开(公告)号:US10515443B2
公开(公告)日:2019-12-24
申请号:US15981166
申请日:2018-05-16
Applicant: Adobe Inc.
Inventor: Xiaohui Shen , Zhe Lin , Shu Kong , Radomir Mech
Abstract: Systems and methods are disclosed for estimating aesthetic quality of digital images using deep learning. In particular, the disclosed systems and methods describe training a neural network to generate an aesthetic quality score digital images. In particular, the neural network includes a training structure that compares relative rankings of pairs of training images to accurately predict a relative ranking of a digital image. Additionally, in training the neural network, an image rating system can utilize content-aware and user-aware sampling techniques to identify pairs of training images that have similar content and/or that have been rated by the same or different users. Using content-aware and user-aware sampling techniques, the neural network can be trained to accurately predict aesthetic quality ratings that reflect subjective opinions of most users as well as provide aesthetic scores for digital images that represent the wide spectrum of aesthetic preferences of various users.
-
公开(公告)号:US20190361994A1
公开(公告)日:2019-11-28
申请号:US15986401
申请日:2018-05-22
Applicant: Adobe Inc.
Inventor: Xiaohui Shen , Zhe Lin , Kalyan Krishna Sunkavalli , Hengshuang Zhao , Brian Lynn Price
Abstract: Compositing aware digital image search techniques and systems are described that leverage machine learning. In one example, a compositing aware image search system employs a two-stream convolutional neural network (CNN) to jointly learn feature embeddings from foreground digital images that capture a foreground object and background digital images that capture a background scene. In order to train models of the convolutional neural networks, triplets of training digital images are used. Each triplet may include a positive foreground digital image and a positive background digital image taken from the same digital image. The triplet also contains a negative foreground or background digital image that is dissimilar to the positive foreground or background digital image that is also included as part of the triplet.
-
公开(公告)号:US10467529B2
公开(公告)日:2019-11-05
申请号:US15177121
申请日:2016-06-08
Applicant: Adobe Inc.
Inventor: Zhe Lin , Yufei Wang , Radomir Mech , Xiaohui Shen , Gavin Stuart Peter Miller
Abstract: In embodiments of convolutional neural network joint training, a computing system memory maintains different data batches of multiple digital image items, where the digital image items of the different data batches have some common features. A convolutional neural network (CNN) receives input of the digital image items of the different data batches, and classifier layers of the CNN are trained to recognize the common features in the digital image items of the different data batches. The recognized common features are input to fully-connected layers of the CNN that distinguish between the recognized common features of the digital image items of the different data batches. A scoring difference is determined between item pairs of the digital image items in a particular one of the different data batches. A piecewise ranking loss algorithm maintains the scoring difference between the item pairs, and the scoring difference is used to train CNN regression functions.
-
-
-
-
-
-
-
-
-