Equivariant models for generating vector representations of temporally-varying content

    公开(公告)号:US12061668B2

    公开(公告)日:2024-08-13

    申请号:US17466636

    申请日:2021-09-03

    Applicant: ADOBE INC.

    CPC classification number: G06F18/213 G06F18/214 G06F18/2413 G06N3/045 G06N3/08

    Abstract: The disclosed invention includes systems and methods for training and employing equivariant models for generating representations (e.g., vector representations) of temporally-varying content, such as but not limited to video content. The trained models are equivariant to temporal transformations applied to the input content (e.g., video content). The trained models are additionally invariant to non-temporal transformations (e.g., spatial and/or color-space transformations) applied to the input content. Such representations are employed in various machine learning tasks, such as but not limited to video retrieval (e.g., video search engine applications), identification of actions depicted in video, and temporally ordering clips of the video.

    Generating scalable and semantically editable font representations

    公开(公告)号:US11977829B2

    公开(公告)日:2024-05-07

    申请号:US17362031

    申请日:2021-06-29

    Applicant: Adobe Inc.

    CPC classification number: G06F40/109 G06N3/045 G06T11/203

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating scalable and semantically editable font representations utilizing a machine learning approach. For example, the disclosed systems generate a font representation code from a glyph utilizing a particular neural network architecture. For example, the disclosed systems utilize a glyph appearance propagation model and perform an iterative process to generate a font representation code from an initial glyph. Additionally, using a glyph appearance propagation model, the disclosed systems automatically propagate the appearance of the initial glyph from the font representation code to generate additional glyphs corresponding to respective glyph labels. In some embodiments, the disclosed systems propagate edits or other changes in appearance of a glyph to other glyphs within a glyph set (e.g., to match the appearance of the edited glyph).

    LOCALIZATION OF NARRATIONS IN IMAGE DATA

    公开(公告)号:US20230115551A1

    公开(公告)日:2023-04-13

    申请号:US17499193

    申请日:2021-10-12

    Applicant: ADOBE INC.

    Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.

    UTILIZING VOXEL FEATURE TRANSFORMATIONS FOR VIEW SYNTHESIS

    公开(公告)号:US20220327767A1

    公开(公告)日:2022-10-13

    申请号:US17807337

    申请日:2022-06-16

    Applicant: Adobe Inc.

    Abstract: Systems, methods, and non-transitory computer-readable media are disclosed for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object. For instance, the disclosed systems can utilize patch-based image feature extraction to extract lifted feature representations from images corresponding to different viewpoints of an object. Furthermore, the disclosed systems can model view-dependent transformed feature representations using learned transformation kernels. In addition, the disclosed systems can recurrently and concurrently aggregate the transformed feature representations to generate a 3D voxel representation of the object. Furthermore, the disclosed systems can sample frustum features using the 3D voxel representation and transformation kernels. Then, the disclosed systems can utilize a patch-based neural rendering approach to render images from frustum feature patches to display a view of the object from various viewpoints.

    Deep learning tag-based font recognition utilizing font classification

    公开(公告)号:US11244207B2

    公开(公告)日:2022-02-08

    申请号:US17101778

    申请日:2020-11-23

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to a tag-based font recognition system that utilizes a multi-learning framework to develop and improve tag-based font recognition using deep learning neural networks. In particular, the tag-based font recognition system jointly trains a font tag recognition neural network with an implicit font classification attention model to generate font tag probability vectors that are enhanced by implicit font classification information. Indeed, the font recognition system weights the hidden layers of the font tag recognition neural network with implicit font information to improve the accuracy and predictability of the font tag recognition neural network, which results in improved retrieval of fonts in response to a font tag query. Accordingly, using the enhanced tag probability vectors, the tag-based font recognition system can accurately identify and recommend one or more fonts in response to a font tag query.

    Image captioning utilizing semantic text modeling and adversarial learning

    公开(公告)号:US11113599B2

    公开(公告)日:2021-09-07

    申请号:US15630604

    申请日:2017-06-22

    Applicant: Adobe Inc.

    Abstract: The present disclosure includes methods and systems for generating captions for digital images. In particular, the disclosed systems and methods can train an image encoder neural network and a sentence decoder neural network to generate a caption from an input digital image. For instance, in one or more embodiments, the disclosed systems and methods train an image encoder neural network (e.g., a character-level convolutional neural network) utilizing a semantic similarity constraint, training images, and training captions. Moreover, the disclosed systems and methods can train a sentence decoder neural network (e.g., a character-level recurrent neural network) utilizing training sentences and an adversarial classifier.

    Font recognition using text localization

    公开(公告)号:US10984295B2

    公开(公告)日:2021-04-20

    申请号:US16590121

    申请日:2019-10-01

    Applicant: Adobe Inc.

    Abstract: Font recognition and similarity determination techniques and systems are described. In a first example, localization techniques are described to train a model using machine learning (e.g., a convolutional neural network) using training images. The model is then used to localize text in a subsequently received image, and may do so automatically and without user intervention, e.g., without specifying any of the edges of a bounding box. In a second example, a deep neural network is directly learned as an embedding function of a model that is usable to determine font similarity. In a third example, techniques are described that leverage attributes described in metadata associated with fonts as part of font recognition and similarity determinations.

    TAG-BASED FONT RECOGNITION BY UTILIZING AN IMPLICIT FONT CLASSIFICATION ATTENTION NEURAL NETWORK

    公开(公告)号:US20200285916A1

    公开(公告)日:2020-09-10

    申请号:US16294417

    申请日:2019-03-06

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to a tag-based font recognition system that utilizes a multi-learning framework to develop and improve tag-based font recognition using deep learning neural networks. In particular, the tag-based font recognition system jointly trains a font tag recognition neural network with an implicit font classification attention model to generate font tag probability vectors that are enhanced by implicit font classification information. Indeed, the font recognition system weights the hidden layers of the font tag recognition neural network with implicit font information to improve the accuracy and predictability of the font tag recognition neural network, which results in improved retrieval of fonts in response to a font tag query. Accordingly, using the enhanced tag probability vectors, the tag-based font recognition system can accurately identify and recommend one or more fonts in response to a font tag query.

Patent Agency Ranking