-
1.
公开(公告)号:US12147771B2
公开(公告)日:2024-11-19
申请号:US17361878
申请日:2021-06-29
Applicant: ADOBE INC.
Inventor: Sangwoo Cho , Franck Dernoncourt , Timothy Jeewun Ganter , Trung Huu Bui , Nedim Lipka , Varun Manjunatha , Walter Chang , Hailin Jin , Jonathan Brandt
IPC: G06F40/35 , G06F40/279
Abstract: System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.
-
公开(公告)号:US12118787B2
公开(公告)日:2024-10-15
申请号:US17499193
申请日:2021-10-12
Applicant: ADOBE INC.
Inventor: Hailin Jin , Bryan Russell , Reuben Xin Hong Tan
IPC: G06K9/00 , G06F18/214 , G06F18/22 , G06N3/04 , G06V20/40 , G10L15/02 , G10L15/16 , G10L15/19 , G10L15/26
CPC classification number: G06V20/41 , G06F18/214 , G06F18/22 , G06N3/04 , G06V20/46 , G10L15/02 , G10L15/16 , G10L15/19 , G10L15/26
Abstract: Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.
-
公开(公告)号:US20230386208A1
公开(公告)日:2023-11-30
申请号:US17804656
申请日:2022-05-31
Applicant: ADOBE INC.
Inventor: Hailin Jin , Jielin Qiu , Zhaowen Wang , Trung Huu Bui , Franck Dernoncourt
IPC: G06V20/40 , G06F16/683 , G06V10/774 , G06F16/34
CPC classification number: G06V20/47 , G06V20/49 , G06F16/685 , G06V10/774 , G06F16/345
Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.
-
公开(公告)号:US11810374B2
公开(公告)日:2023-11-07
申请号:US17240097
申请日:2021-04-26
Applicant: Adobe Inc.
Inventor: Zhaowen Wang , Hailin Jin , Yang Liu
IPC: G06V20/62 , G06V30/148 , G06F18/214 , G06V10/764
CPC classification number: G06V20/62 , G06F18/214 , G06V10/764 , G06V20/63 , G06V30/153 , G06V2201/01
Abstract: In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.
-
公开(公告)号:US11776180B2
公开(公告)日:2023-10-03
申请号:US16802440
申请日:2020-02-26
Applicant: ADOBE INC.
Inventor: Ning Xu , Bayram Safa Cicek , Hailin Jin , Zhaowen Wang
IPC: G06N20/20 , G06T11/60 , G06N3/088 , G06T11/00 , G06F18/214 , G06N3/045 , G06V10/764 , G06V10/774 , G06V10/82 , G06V10/44
CPC classification number: G06T11/60 , G06F18/214 , G06N3/045 , G06N3/088 , G06T11/00 , G06V10/454 , G06V10/764 , G06V10/774 , G06V10/82 , G06T2210/36
Abstract: Embodiments of the present disclosure are directed towards improved models trained using unsupervised domain adaptation. In particular, a style-content adaptation system provides improved translation during unsupervised domain adaptation by controlling the alignment of conditional distributions of a model during training such that content (e.g., a class) from a target domain is correctly mapped to content (e.g., the same class) in a source domain. The style-content adaptation system improves unsupervised domain adaptation using independent control over content (e.g., related to a class) as well as style (e.g., related to a domain) to control alignment when translating between the source and target domain. This independent control over content and style can also allow for images to be generated using the style-content adaptation system that contain desired content and/or style.
-
公开(公告)号:US11676060B2
公开(公告)日:2023-06-13
申请号:US15002206
申请日:2016-01-20
Applicant: Adobe Inc.
Inventor: Anirban Roychowdhury , Hung H. Bui , Trung H. Bui , Hailin Jin
Abstract: Digital content interaction prediction and training techniques that address imbalanced classes are described. In one or more implementations, a digital medium environment is described to predict user interaction with digital content that addresses an imbalance of numbers included in first and second classes in training data used to train a model using machine learning. The training data is received that describes the first class and the second class. A model is trained using machine learning. The training includes sampling the training data to include at least one subset of the training data from the first class and at least one subset of the training data from the second class. Iterative selections are made of a batch from the sampled training data. The iteratively selected batches are iteratively processed by a classifier implemented using machine learning to train the model.
-
7.
公开(公告)号:US11636147B2
公开(公告)日:2023-04-25
申请号:US17584962
申请日:2022-01-26
Applicant: Adobe Inc.
Inventor: Zhaowen Wang , Tianlang Chen , Ning Xu , Hailin Jin
IPC: G06F16/906 , G06F16/55 , G06N3/084 , G06F16/903 , G06F40/109 , G06V30/244 , G06F18/28 , G06F18/21 , G06F18/214 , G06F18/2415 , G06V30/19 , G06V30/226 , G06V10/82 , G06V10/44
Abstract: The present disclosure relates to a tag-based font recognition system that utilizes a multi-learning framework to develop and improve tag-based font recognition using deep learning neural networks. In particular, the tag-based font recognition system jointly trains a font tag recognition neural network with an implicit font classification attention model to generate font tag probability vectors that are enhanced by implicit font classification information. Indeed, the font recognition system weights the hidden layers of the font tag recognition neural network with implicit font information to improve the accuracy and predictability of the font tag recognition neural network, which results in improved retrieval of fonts in response to a font tag query. Accordingly, using the enhanced tag probability vectors, the tag-based font recognition system can accurately identify and recommend one or more fonts in response to a font tag query.
-
8.
公开(公告)号:US11615308B2
公开(公告)日:2023-03-28
申请号:US17563901
申请日:2021-12-28
Applicant: Adobe Inc.
Inventor: Wentian Zhao , Seokhwan Kim , Ning Xu , Hailin Jin
IPC: G06K9/00 , G06N3/02 , G06F17/16 , G06N3/08 , G06V20/40 , G06V30/18 , G06V30/19 , G06V10/82 , G06V20/62 , G06V30/10
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media for generating a response to a question received from a user during display or playback of a video segment by utilizing a query-response-neural network. The disclosed systems can extract a query vector from a question corresponding to the video segment using the query-response-neural network. The disclosed systems further generate context vectors representing both visual cues and transcript cues corresponding to the video segment using context encoders or other layers from the query-response-neural network. By utilizing additional layers from the query-response-neural network, the disclosed systems generate (i) a query-context vector based on the query vector and the context vectors, and (ii) candidate-response vectors representing candidate responses to the question from a domain-knowledge base or other source. To respond to a user's question, the disclosed systems further select a response from the candidate responses based on a comparison of the query-context vector and the candidate-response vectors.
-
9.
公开(公告)号:US20220414338A1
公开(公告)日:2022-12-29
申请号:US17361878
申请日:2021-06-29
Applicant: ADOBE INC.
Inventor: SANGWOO CHO , Franck Dernoncourt , Timothy Jeewun Ganter , Trung Huu Bui , Nedim Lipka , Varun Manjunatha , Walter Chang , Hailin Jin , Jonathan Brandt
IPC: G06F40/35 , G06F40/279
Abstract: System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.
-
公开(公告)号:US20220092108A1
公开(公告)日:2022-03-24
申请号:US17025041
申请日:2020-09-18
Applicant: Adobe Inc.
Inventor: John Collomosse , Zhe Lin , Saeid Motiian , Hailin Jin , Baldo Faieta , Alex Filipkowski
IPC: G06F16/583 , G06F16/535 , G06F16/532 , G06N3/08
Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly identifying digital images with similar style to a query digital image using fine-grain style determination via weakly supervised style extraction neural networks. For example, the disclosed systems can extract a style embedding from a query digital image using a style extraction neural network such as a novel two-branch autoencoder architecture or a weakly supervised discriminative neural network. The disclosed systems can generate a combined style embedding by combining complementary style embeddings from different style extraction neural networks. Moreover, the disclosed systems can search a repository of digital images to identify digital images with similar style to the query digital image. The disclosed systems can also learn parameters for one or more style extraction neural network through weakly supervised training without a specifically labeled style ontology for sample digital images.
-
-
-
-
-
-
-
-
-