Collaborative feature learning from social media

    公开(公告)号:US10565518B2

    公开(公告)日:2020-02-18

    申请号:US14748059

    申请日:2015-06-23

    Applicant: Adobe Inc.

    Abstract: The present disclosure is directed to collaborative feature learning using social media data. For example, a machine learning system may identify social media data that includes user behavioral data, which indicates user interactions with content item. Using the identified social user behavioral data, the machine learning system may determine latent representations from the content items. In some embodiments, the machine learning system may train a machine-learning model based on the latent representations. Further, the machine learning system may extract features of the content item from the trained machine-learning model.

    Font recognition using triplet loss neural network training

    公开(公告)号:US10515295B2

    公开(公告)日:2019-12-24

    申请号:US15796213

    申请日:2017-10-27

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to a font recognition system that employs a multi-task learning framework to jointly improve font classification and remove negative side effects caused by intra-class variances of glyph content. For example, in one or more embodiments, the font recognition system can jointly train a font recognition neural network using a font classification loss model and triplet loss model to generate a deep learning neural network that provides improved font classifications. In addition, the font recognition system can employ the trained font recognition neural network to efficiently recognize fonts within input images as well as provide other suggested fonts.

    FONT RECOGNITION USING TRIPLET LOSS NEURAL NETWORK TRAINING

    公开(公告)号:US20190130231A1

    公开(公告)日:2019-05-02

    申请号:US15796213

    申请日:2017-10-27

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to a font recognition system that employs a multi-task learning framework to jointly improve font classification and remove negative side effects caused by intra-class variances of glyph content. For example, in one or more embodiments, the font recognition system can jointly train a font recognition neural network using a font classification loss model and triplet loss model to generate a deep learning neural network that provides improved font classifications. In addition, the font recognition system can employ the trained font recognition neural network to efficiently recognize fonts within input images as well as provide other suggested fonts.

    Combined structure and style network

    公开(公告)号:US10268928B2

    公开(公告)日:2019-04-23

    申请号:US15616776

    申请日:2017-06-07

    Applicant: Adobe Inc.

    Abstract: A combined structure and style network is described. Initially, a large set of training images, having a variety of different styles, is obtained. Each of these training images is associated with one of multiple different predetermined style categories indicating the image's style and one of multiple different predetermined semantic categories indicating objects depicted in the image. Groups of these images are formed, such that each group includes an anchor image having one of the styles, a positive-style example image having the same style as the anchor image, and a negative-style example image having a different style. Based on those groups, an image style network is generated to identify images having desired styling by recognizing visual characteristics of the different styles. The image style network is further combined, according to a unifying training technique, with an image structure network configured to recognize desired objects in images irrespective of image style.

    IDENTIFYING AND LOCALIZING EDITORIAL CHANGES TO IMAGES UTILIZING DEEP LEARNING

    公开(公告)号:US20230386054A1

    公开(公告)日:2023-11-30

    申请号:US17804376

    申请日:2022-05-27

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to identify regions of an image that have been editorially modified. For example, the image comparison system includes a deep image comparator model that compares a pair of images and localizes regions that have been editorially manipulated relative to an original or trusted image. More specifically, the deep image comparator model generates and surfaces visual indications of the location of such editorial changes on the modified image. The deep image comparator model is robust and ignores discrepancies due to benign image transformations that commonly occur during electronic image distribution. The image comparison system optionally includes an image retrieval model utilizes a visual search embedding that is robust to minor manipulations or benign modifications of images. The image retrieval model utilizes a visual search embedding for an image to robustly identify near duplicate images.

    Utilizing voxel feature transformations for view synthesis

    公开(公告)号:US11823322B2

    公开(公告)日:2023-11-21

    申请号:US17807337

    申请日:2022-06-16

    Applicant: Adobe Inc.

    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels. Then, the disclosed systems can utilize a patch-based neural rendering approach to render images from frustum feature patches to display a view of the object from various viewpoints.

    EQUIVARIANT MODELS FOR GENERATING VECTOR REPRESENTATIONS OF TEMPORALLY-VARYING CONTENT

    公开(公告)号:US20230075087A1

    公开(公告)日:2023-03-09

    申请号:US17466636

    申请日:2021-09-03

    Applicant: ADOBE INC.

    Abstract: The disclosed invention includes systems and methods for training and employing equivariant models for generating representations (e.g., vector representations) of temporally-varying content, such as but not limited to video content. The trained models are equivariant to temporal transformations applied to the input content (e.g., video content). The trained models are additionally invariant to non-temporal transformations (e.g., spatial and/or color-space transformations) applied to the input content. Such representations are employed in various machine learning tasks, such as but not limited to video retrieval (e.g., video search engine applications), identification of actions depicted in video, and temporally ordering clips of the video.

    GENERATING SCALABLE AND SEMANTICALLY EDITABLE FONT REPRESENTATIONS

    公开(公告)号:US20220414314A1

    公开(公告)日:2022-12-29

    申请号:US17362031

    申请日:2021-06-29

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).

    GENERATING RESPONSES TO QUERIES ABOUT VIDEOS UTILIZING A MULTI-MODAL NEURAL NETWORK WITH ATTENTION

    公开(公告)号:US20220122357A1

    公开(公告)日:2022-04-21

    申请号:US17563901

    申请日:2021-12-28

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for generating a response to a question received from a user during display or playback of a video segment by utilizing a query-response-neural network. The disclosed systems can extract a query vector from a question corresponding to the video segment using the query-response-neural network. The disclosed systems further generate context vectors representing both visual cues and transcript cues corresponding to the video segment using context encoders or other layers from the query-response-neural network. By utilizing additional layers from the query-response-neural network, the disclosed systems generate (i) a query-context vector based on the query vector and the context vectors, and (ii) candidate-response vectors representing candidate responses to the question from a domain-knowledge base or other source. To respond to a user's question, the disclosed systems further select a response from the candidate responses based on a comparison of the query-context vector and the candidate-response vectors.

    GENERATING ACTION TAGS FOR DIGITAL VIDEOS

    公开(公告)号:US20210409836A1

    公开(公告)日:2021-12-30

    申请号:US17470441

    申请日:2021-09-09

    Applicant: Adobe Inc.

    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for automatic tagging of videos. In particular, in one or more embodiments, the disclosed systems generate a set of tagged feature vectors (e.g., tagged feature vectors based on action-rich digital videos) to utilize to generate tags for an input digital video. For instance, the disclosed systems can extract a set of frames for the input digital video and generate feature vectors from the set of frames. In some embodiments, the disclosed systems generate aggregated feature vectors from the feature vectors. Furthermore, the disclosed systems can utilize the feature vectors (or aggregated feature vectors) to identify similar tagged feature vectors from the set of tagged feature vectors. Additionally, the disclosed systems can generate a set of tags for the input digital videos by aggregating one or more tags corresponding to identified similar tagged feature vectors.

Patent Agency Ranking