-
公开(公告)号:US20240037906A1
公开(公告)日:2024-02-01
申请号:US17814921
申请日:2022-07-26
Applicant: ADOBE INC.
Inventor: Qiuyu Chen , Quan Hung Tran , Kushal Kafle , Trung Huu Bui , Franck Dernoncourt , Walter W. Chang
IPC: G06V10/764 , G06V10/56 , G06V10/774
CPC classification number: G06V10/764 , G06V10/56 , G06V10/774 , G06V2201/10
Abstract: Systems and methods for color prediction are described. Embodiments of the present disclosure receive an image that includes an object including a color, generate a color vector based on the image using a color classification network, where the color vector includes a color value corresponding to each of a set of colors, generate a bias vector by comparing the color vector to teach of a set of center vectors, where each of the set of center vectors corresponds to a color of the set of colors, and generate an unbiased color vector based on the color vector and the bias vector, where the unbiased color vector indicates the color of the object.
-
公开(公告)号:US20230386208A1
公开(公告)日:2023-11-30
申请号:US17804656
申请日:2022-05-31
Applicant: ADOBE INC.
Inventor: Hailin Jin , Jielin Qiu , Zhaowen Wang , Trung Huu Bui , Franck Dernoncourt
IPC: G06V20/40 , G06F16/683 , G06V10/774 , G06F16/34
CPC classification number: G06V20/47 , G06V20/49 , G06F16/685 , G06V10/774 , G06F16/345
Abstract: Systems and methods for video segmentation and summarization are described. Embodiments of the present disclosure receive a video and a transcript of the video; generate visual features representing frames of the video using an image encoder; generate language features representing the transcript using a text encoder, wherein the image encoder and the text encoder are trained based on a correlation between training visual features and training language features; and segment the video into a plurality of video segments based on the visual features and the language features.
-
公开(公告)号:US20230297603A1
公开(公告)日:2023-09-21
申请号:US17655395
申请日:2022-03-18
Applicant: ADOBE INC.
Inventor: Meryem M'hamdi , Doo Soon Kim , Franck Dernoncourt , Trung Huu Bui
IPC: G06F16/33 , G06F40/35 , G06F40/279 , G06N20/00
CPC classification number: G06F16/3344 , G06F40/35 , G06F40/279 , G06N20/00
Abstract: Systems and methods for natural language processing are described. Embodiments of the present disclosure identify a task set including a plurality of pseudo tasks, wherein each of the plurality of pseudo tasks includes a support set corresponding to a first natural language processing (NLP) task and a query set corresponding to a second NLP task; update a machine learning model in an inner loop based on the support set; update the machine learning model in an outer loop based on the query set; and perform the second NLP task using the machine learning model.
-
公开(公告)号:US20230259708A1
公开(公告)日:2023-08-17
申请号:US17650876
申请日:2022-02-14
Applicant: ADOBE INC.
Inventor: Amir Pouran Ben Veyseh , Franck Dernoncourt , Walter W. Chang , Trung Huu Bui , Hanieh Deilamsalehy , Seunghyun Yoon , Rajiv Bhawanji Jain , Quan Hung Tran , Varun Manjunatha
IPC: G06F40/289 , G06F40/30 , G10L15/22 , G10L15/06 , G10L15/16
CPC classification number: G06F40/289 , G06F40/30 , G10L15/22 , G10L15/063 , G10L15/16 , G10L2015/0635
Abstract: Systems and methods for key-phrase extraction are described. The systems and methods include receiving a transcript including a text paragraph and generating key-phrase data for the text paragraph using a key-phrase extraction network. The key-phrase extraction network is trained to identify domain-relevant key-phrase data based on domain data obtained using a domain discriminator network. The systems and methods further include generating meta-data for the transcript based on the key-phrase data.
-
45.
公开(公告)号:US20220414338A1
公开(公告)日:2022-12-29
申请号:US17361878
申请日:2021-06-29
Applicant: ADOBE INC.
Inventor: SANGWOO CHO , Franck Dernoncourt , Timothy Jeewun Ganter , Trung Huu Bui , Nedim Lipka , Varun Manjunatha , Walter Chang , Hailin Jin , Jonathan Brandt
IPC: G06F40/35 , G06F40/279
Abstract: System and methods for a text summarization system are described. In one example, a text summarization system receives an input utterance and determines whether the utterance should be included in a summary of the text. The text summarization system includes an embedding network, a convolution network, an encoding component, and a summary component. The embedding network generates a semantic embedding of an utterance. The convolution network generates a plurality of feature vectors based on the semantic embedding. The encoding component identifies a plurality of latent codes respectively corresponding to the plurality of feature vectors. The summary component identifies a prominent code among the latent codes and to select the utterance as a summary utterance based on the prominent code.
-
公开(公告)号:US11468880B2
公开(公告)日:2022-10-11
申请号:US16198302
申请日:2018-11-21
Applicant: Adobe Inc.
Inventor: Tzu-Hsiang Lin , Trung Huu Bui , Doo Soon Kim
Abstract: Dialog system training techniques using a simulated user system are described. In one example, a simulated user system supports multiple agents. The dialog system, for instance, may be configured for use with an application (e.g., digital image editing application). The simulated user system may therefore simulate user actions involving both the application and the dialog system which may be used to train the dialog system. Additionally, the simulated user system is not limited to simulation of user interactions by a single input mode (e.g., natural language inputs), but also supports multimodal inputs. Further, the simulated user system may also support use of multiple goals within a single dialog session
-
47.
公开(公告)号:US20210182662A1
公开(公告)日:2021-06-17
申请号:US16717698
申请日:2019-12-17
Applicant: Adobe Inc.
Inventor: Tuan Manh Lai , Trung Huu Bui , Quan Hung Tran
IPC: G06N3/08 , G06N3/04 , G06F40/284
Abstract: Techniques for training a first neural network (NN) model using a pre-trained second NN model are disclosed. In an example, training data is input to the first and second models. The training data includes masked tokens and unmasked tokens. In response, the first model generates a first prediction associated with a masked token and a second prediction associated with an unmasked token, and the second model generates a third prediction associated with the masked token and a fourth prediction associated with the unmasked token. The first model is trained, based at least in part on the first, second, third, and fourth predictions. In another example, a prediction associated with a masked token, a prediction associated with an unmasked token, and a prediction associated with whether two sentences of training data are adjacent sentences are received from each of the first and second models. The first model is trained using the predictions.
-
公开(公告)号:US10769495B2
公开(公告)日:2020-09-08
申请号:US16052246
申请日:2018-08-01
Applicant: Adobe Inc.
Inventor: Trung Huu Bui , Zhe Lin , Walter Wei-Tuh Chang , Nham Van Le , Franck Dernoncourt
IPC: G06K9/62 , G06F3/16 , G06F3/0488 , G10L15/06 , G06F9/451 , G06F3/0482 , G06F16/54 , G06N3/08 , G06N20/00 , G06F3/0484
Abstract: In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.
-
公开(公告)号:US10713519B2
公开(公告)日:2020-07-14
申请号:US15630779
申请日:2017-06-22
Applicant: ADOBE INC.
Inventor: Trung Huu Bui , Hung Hai Bui , Shawn Alan Gaither , Walter Wei-Tuh Chang , Michael Frank Kraley , Pranjal Daga
Abstract: The present invention is directed towards providing automated workflows for the identification of a reading order from text segments extracted from a document. Ordering the text segments is based on trained natural language models. In some embodiments, the workflows are enabled to perform a method for identifying a sequence associated with a portable document. The methods includes iteratively generating a probabilistic language model, receiving the portable document, and selectively extracting features (such as but not limited to text segments) from the document. The method may generate pairs of features (or feature pair from the extracted features). The method may further generate a score for each of the pairs based on the probabilistic language model and determine an order to features based on the scores. The method may provide the extracted features in the determined order.
-
公开(公告)号:US20200042286A1
公开(公告)日:2020-02-06
申请号:US16052246
申请日:2018-08-01
Applicant: Adobe Inc.
Inventor: Trung Huu Bui , Zhe Lin , Walter Wei-Tuh Chang , Nham Van Le , Franck Dernoncourt
IPC: G06F3/16 , G10L15/26 , G06F3/0488 , G06F3/0482 , G10L15/06 , G06F17/30 , G06F9/451
Abstract: In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.
-
-
-
-
-
-
-
-
-