EFFICIENT VISION-LANGUAGE RETRIEVAL USING STRUCTURAL PRUNING

    公开(公告)号:US20250013866A1

    公开(公告)日:2025-01-09

    申请号:US18347877

    申请日:2023-07-06

    Applicant: ADOBE INC.

    Abstract: Systems and methods for reducing inference time of vision-language models, as well as for multimodal search, are described herein. Embodiments are configured to obtain an embedding neural network. The embedding neural network is pretrained to embed inputs from a plurality of modalities into a multimodal embedding space. Embodiments are further configured to perform a first progressive pruning stage, where the first progressive pruning stage includes a first pruning of the embedding neural network and a first fine-tuning of the embedding neural network. Embodiments then perform a second progressive pruning stage based on an output of the first progressive pruning stage, where the second progressive pruning stage includes a second pruning of the embedding neural network and a second fine-tuning of the embedding neural network.

    DIGITAL CONTENT LAYOUT ENCODING FOR SEARCH

    公开(公告)号:US20240419750A1

    公开(公告)日:2024-12-19

    申请号:US18822367

    申请日:2024-09-02

    Applicant: Adobe Inc.

    Abstract: Digital content layout encoding techniques for search are described. In these techniques, a layout representation is generated (using machine learning automatically and without user intervention) that describes a layout of elements included within the digital content. In an implementation, the layout representation includes a description of both spatial and structural aspects of the elements in relation to each other. To do so, a two-pathway pipeline that is configured to model layout from both spatial and structural aspects using a spatial pathway, and a structural pathway, respectively. In one example, this is also performed through use of multi-level encoding and fusion to generate a layout representation.

    Digital Content Layout Encoding for Search
    4.
    发明公开

    公开(公告)号:US20230359682A1

    公开(公告)日:2023-11-09

    申请号:US17735748

    申请日:2022-05-03

    Applicant: Adobe Inc.

    CPC classification number: G06F16/9537 G06F40/30 G06N20/00

    Abstract: Digital content layout encoding techniques for search are described. In these techniques, a layout representation is generated (using machine learning automatically and without user intervention) that describes a layout of elements included within the digital content. In an implementation, the layout representation includes a description of both spatial and structural aspects of the elements in relation to each other. To do so, a two-pathway pipeline that is configured to model layout from both spatial and structural aspects using a spatial pathway, and a structural pathway, respectively. In one example, this is also performed through use of multi-level encoding and fusion to generate a layout representation.

Patent Agency Ranking