Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching

    公开(公告)号:US20210271707A1

    公开(公告)日:2021-09-02

    申请号:US16803480

    申请日:2020-02-27

    Applicant: Adobe Inc.

    Abstract: Certain embodiments involve a method for generating a search result. The method includes processing devices performing operations including receiving a query having a text input by a joint embedding model trained to generate an image result. Training the joint embedding model includes accessing a set of images and textual information. Training further includes encoding the images into image feature vectors based on spatial features. Further, training includes encoding the textual information into textual feature vectors based on semantic information. Training further includes generating a set of image-text pairs based on matches between image feature vectors and textual feature vectors. Further, training includes generating a visual grounding dataset based on spatial information. Training further includes generating a set of visual-semantic joint embeddings by grounding the image-text pairs with the visual grounding dataset. Additionally, operations include generating an image result for display by the joint embedding model based on the text input.

    FRAME SELECTION BASED ON A TRAINED NEURAL NETWORK

    公开(公告)号:US20210201150A1

    公开(公告)日:2021-07-01

    申请号:US17204370

    申请日:2021-03-17

    Applicant: Adobe Inc.

    Abstract: Various embodiments describe frame selection based on training and using a neural network. In an example, the neural network is a convolutional neural network trained with training pairs. Each training pair includes two training frames from a frame collection. The loss function relies on the estimated quality difference between the two training frames. Further, the definition of the loss function varies based on the actual quality difference between these two frames. In a further example, the neural network is trained by incorporating facial heatmaps generated from the training frames and facial quality scores of faces detected in the training frames. In addition, the training involves using a feature mean that represents an average of the features of the training frames belonging to the same frame collection. Once the neural network is trained, a frame collection is input thereto and a frame is selected based on generated quality scores.

    Utilizing a digital canvas to conduct a spatial-semantic search for digital visual media

    公开(公告)号:US10963759B2

    公开(公告)日:2021-03-30

    申请号:US16417115

    申请日:2019-05-20

    Applicant: Adobe Inc.

    Abstract: The present disclosure includes methods and systems for searching for digital visual media based on semantic and spatial information. In particular, one or more embodiments of the disclosed systems and methods identify digital visual media displaying targeted visual content in a targeted region based on a query term and a query area provide via a digital canvas. Specifically, the disclosed systems and methods can receive user input of a query term and a query area and provide the query term and query area to a query neural network to generate a query feature set. Moreover, the disclosed systems and methods can compare the query feature set to digital visual media feature sets. Further, based on the comparison, the disclosed systems and methods can identify digital visual media portraying targeted visual content corresponding to the query term within a targeted region corresponding to the query area.

    CONTEXT-BASED IMAGE TAG TRANSLATION
    366.
    发明申请

    公开(公告)号:US20210064704A1

    公开(公告)日:2021-03-04

    申请号:US16553305

    申请日:2019-08-28

    Applicant: Adobe Inc.

    Inventor: Yang Yang Zhe Lin

    Abstract: In some embodiments, a context-based translation application generates a co-occurrence data structure for a target language describing co-occurrences of target language words and source language words. The context-based translation application receives an input tag for an input image in the source language to be translated into the target language. The context-based translation application obtains multiple candidate translations in the target language for the input tag and determines a translated tag from the multiple candidate translations based on the co-occurrence data structure. The context-based translation application further associates the translated tag with the input image.

    Super-resolution with reference images

    公开(公告)号:US10885608B2

    公开(公告)日:2021-01-05

    申请号:US16001656

    申请日:2018-06-06

    Applicant: Adobe Inc.

    Abstract: In implementations of super-resolution with reference images, a super-resolution image is generated based on reference images. Reference images are not constrained to have same or similar content as a low-resolution image being super-resolved. Texture features indicating high-frequency content are extracted into texture feature maps, and patches of texture feature maps of reference images are determined based on texture feature similarity. A content feature map indicating low-frequency content of an image is adaptively fused with a swapped texture feature map including patches of reference images with a neural network based on similarity of texture features. A user interfaces allows a user to select regions of multiple reference images to use for super-resolution. Hence, a super-resolution image can be generated with rich texture details incorporated from multiple reference images, even in the absence of reference images having similar content to an image being upscaled.

    MULTI-MODULE AND MULTI-TASK MACHINE LEARNING SYSTEM BASED ON AN ENSEMBLE OF DATASETS

    公开(公告)号:US20200349464A1

    公开(公告)日:2020-11-05

    申请号:US16401548

    申请日:2019-05-02

    Applicant: Adobe Inc.

    Abstract: Techniques and systems are provided for training a machine learning model using different datasets to perform one or more tasks. The machine learning model can include a first sub-module configured to perform a first task and a second sub-module configured to perform a second task. The first sub-module can be selected for training using a first training dataset based on a format of the first training dataset. The first sub-module can then be trained using the first training dataset to perform the first task. The second sub-module can be selected for training using a second training dataset based on a format of the second training dataset. The second sub-module can then be trained using the second training dataset to perform the second task.

    Custom Auto Tagging of Multiple Objects
    370.
    发明申请

    公开(公告)号:US20200342255A1

    公开(公告)日:2020-10-29

    申请号:US16928949

    申请日:2020-07-14

    Applicant: Adobe Inc.

    Abstract: There is described a computing device and method in a digital medium environment for custom auto tagging of multiple objects. The computing device includes an object detection network and multiple image classification networks. An image is received at the object detection network and includes multiple visual objects. First feature maps are applied to the image at the object detection network and generate object regions associated with the visual objects. The object regions are assigned to the multiple image classification networks, and each image classification network is assigned to a particular object region. The second feature maps are applied to each object region at each image classification network, and each image classification network outputs one or more classes associated with a visual object corresponding to each object region.

Patent Agency Ranking