-
公开(公告)号:US12105767B2
公开(公告)日:2024-10-01
申请号:US17735748
申请日:2022-05-03
Applicant: Adobe Inc.
Inventor: Zhaowen Wang , Yue Bai , John Philip Collomosse
IPC: G06F16/9537 , G06F40/103 , G06F40/30 , G06V30/19 , G06K15/02 , G06N3/08 , G06N20/00 , G06V10/82 , G06V30/412 , G06V30/414
CPC classification number: G06F16/9537 , G06F40/103 , G06F40/30 , G06V30/19127 , G06K15/1885 , G06N3/08 , G06N20/00 , G06V10/82 , G06V30/412 , G06V30/414
Abstract: Digital content layout encoding techniques for search are described. In these techniques, a layout representation is generated (using machine learning automatically and without user intervention) that describes a layout of elements included within the digital content. In an implementation, the layout representation includes a description of both spatial and structural aspects of the elements in relation to each other. To do so, a two-pathway pipeline that is configured to model layout from both spatial and structural aspects using a spatial pathway, and a structural pathway, respectively. In one example, this is also performed through use of multi-level encoding and fusion to generate a layout representation.
-
公开(公告)号:US20250013866A1
公开(公告)日:2025-01-09
申请号:US18347877
申请日:2023-07-06
Applicant: ADOBE INC.
Inventor: Handong Zhao , Yue Bai , Zhe Lin , Ajinkya Gorakhnath Kale , Jiuxiang Gu , Tong Yu , Sungchul Kim
Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.
-
公开(公告)号:US20240419750A1
公开(公告)日:2024-12-19
申请号:US18822367
申请日:2024-09-02
Applicant: Adobe Inc.
Inventor: Zhaowen Wang , Yue Bai , John Philip Collomosse
IPC: G06F16/9537 , G06F40/103 , G06F40/30 , G06K15/02 , G06N3/08 , G06N20/00 , G06V10/82 , G06V30/19 , G06V30/412 , G06V30/414
Abstract: Digital content layout encoding techniques for search are described. In these techniques, a layout representation is generated (using machine learning automatically and without user intervention) that describes a layout of elements included within the digital content. In an implementation, the layout representation includes a description of both spatial and structural aspects of the elements in relation to each other. To do so, a two-pathway pipeline that is configured to model layout from both spatial and structural aspects using a spatial pathway, and a structural pathway, respectively. In one example, this is also performed through use of multi-level encoding and fusion to generate a layout representation.
-
公开(公告)号:US20230359682A1
公开(公告)日:2023-11-09
申请号:US17735748
申请日:2022-05-03
Applicant: Adobe Inc.
Inventor: Zhaowen Wang , Yue Bai , John Philip Collomosse
IPC: G06F16/9537 , G06F40/30
CPC classification number: G06F16/9537 , G06F40/30 , G06N20/00
Abstract: Digital content layout encoding techniques for search are described. In these techniques, a layout representation is generated (using machine learning automatically and without user intervention) that describes a layout of elements included within the digital content. In an implementation, the layout representation includes a description of both spatial and structural aspects of the elements in relation to each other. To do so, a two-pathway pipeline that is configured to model layout from both spatial and structural aspects using a spatial pathway, and a structural pathway, respectively. In one example, this is also performed through use of multi-level encoding and fusion to generate a layout representation.
-
公开(公告)号:US20240404243A1
公开(公告)日:2024-12-05
申请号:US18328950
申请日:2023-06-05
Applicant: ADOBE INC.
Inventor: Handong Zhao , Yue Bai , Zhe Lin , Ajinkya Gorakhnath Kale , Jiuxiang Gu , Tong Yu , Sungchul Kim
IPC: G06V10/75 , G06F16/332 , G06V10/774
Abstract: Systems and methods for multimodal machine learning are provided. According to one aspect, a method for multimodal machine learning includes obtaining a prompt; encoding the prompt using a multimodal encoder to obtain a prompt embedding, wherein the encoding comprises generating a plurality of multi-head attention (MHA) outputs corresponding to a plurality of different scales, respectively, and combining the plurality of MHA outputs using a multi-scale aggregator; and generating a response to the prompt based on the prompt embedding.
-
-
-
-