Enhanced semantic segmentation of images

    公开(公告)号:US11676282B2

    公开(公告)日:2023-06-13

    申请号:US17479646

    申请日:2021-09-20

    Applicant: ADOBE INC.

    CPC classification number: G06T7/11 G06N3/045 G06T2207/20081 G06T2207/20084

    Abstract: Enhanced methods and systems for the semantic segmentation of images are described. A refined segmentation mask for a specified object visually depicted in a source image is generated based on a coarse and/or raw segmentation mask. The refined segmentation mask is generated via a refinement process applied to the coarse segmentation mask. The refinement process correct at least a portion of both type I and type II errors, as well as refine boundaries of the specified object, associated with the coarse segmentation mask. Thus, the refined segmentation mask provides a more accurate segmentation of the object than the coarse segmentation mask. A segmentation refinement model is employed to generate the refined segmentation mask based on the coarse segmentation mask. That is, the segmentation model is employed to refine the coarse segmentation mask to generate more accurate segmentations of the object. The refinement process is an iterative refinement process carried out via a trained neural network.

    Preserving regions of interest in automatic image cropping

    公开(公告)号:US11663762B2

    公开(公告)日:2023-05-30

    申请号:US17083899

    申请日:2020-10-29

    Applicant: Adobe Inc.

    Abstract: Embodiments of the present invention are directed to facilitating region of interest preservation. In accordance with some embodiments of the present invention, a region of interest preservation score using adaptive margins is determined. The region of interest preservation score indicates an extent to which at least one region of interest is preserved in a candidate image crop associated with an image. A region of interest positioning score is determined that indicates an extent to which a position of the at least one region of interest is preserved in the candidate image crop associated with the image. The region of interest preservation score and/or the preserving score are used to select a set of one or more candidate image crops as image crop suggestions.

    Visually guided machine-learning language model

    公开(公告)号:US11605019B2

    公开(公告)日:2023-03-14

    申请号:US16426298

    申请日:2019-05-30

    Applicant: Adobe Inc.

    Abstract: Visually guided machine-learning language model and embedding techniques are described that overcome the challenges of conventional techniques in a variety of ways. In one example, a model is trained to support a visually guided machine-learning embedding space that supports visual intuition as to “what” is represented by text. The visually guided language embedding space supported by the model, once trained, may then be used to support visual intuition as part of a variety of functionality. In one such example, the visually guided language embedding space as implemented by the model may be leveraged as part of a multi-modal differential search to support search of digital images and other digital content with real-time focus adaptation which overcomes the challenges of conventional techniques.

    GENERATING SCENE GRAPHS FROM DIGITAL IMAGES USING EXTERNAL KNOWLEDGE AND IMAGE RECONSTRUCTION

    公开(公告)号:US20220309762A1

    公开(公告)日:2022-09-29

    申请号:US17805289

    申请日:2022-06-03

    Applicant: Adobe Inc.

    Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.

    Generalizable robot approach control techniques

    公开(公告)号:US11449079B2

    公开(公告)日:2022-09-20

    申请号:US16262448

    申请日:2019-01-30

    Applicant: Adobe Inc.

    Abstract: Systems and techniques are described that provide for generalizable approach policy learning and implementation for robotic object approaching. Described techniques provide fast and accurate approaching of a specified object, or type of object, in many different environments. The described techniques enable a robot to receive an identification of an object or type of object from a user, and then navigate to the desired object, without further control from the user. Moreover, the approach of the robot to the desired object is performed efficiently, e.g., with a minimum number of movements. Further, the approach techniques may be used even when the robot is placed in a new environment, such as when the same type of object must be approached in multiple settings.

    Multimodal Sequential Recommendation with Window Co-Attention

    公开(公告)号:US20220295149A1

    公开(公告)日:2022-09-15

    申请号:US17200691

    申请日:2021-03-12

    Applicant: Adobe Inc.

    Abstract: A multimodal recommendation identification system analyzes data describing a sequence of past content item interactions to generate a recommendation for a content item for a user. An indication of the recommended content item is provided to a website hosting system or recommendation system so that the recommended content item is displayed or otherwise presented to the user. The multimodal recommendation identification system identifies a content item to recommend to the user by generating an encoding that encodes identifiers of the sequence of content items the user has interacted with and generating encodings that encode multimodal information for content items in the sequence of content items the user has interacted with. An aggregated information encoding for a user based on these encodings and a system analyzes the content item sequence encoding and interaction between the content item sequence encoding and the multiple modality encodings to generate the aggregated information encoding.

Patent Agency Ranking