Patent search ap:("Adobe Inc.") AND inv:"Denil Pareshbhai Mehta" Page 1

1.

发明授权
Modality adaptive information retrieval 有权

公开(公告)号：US12198048B2

公开(公告)日：2025-01-14

申请号：US17153130

申请日：2021-01-20

Applicant: Adobe Inc.

Inventor： Hrituraj Singh , Jatin Lamba , Denil Pareshbhai Mehta , Balaji Vasan Srinivasan , Anshul Nasery , Aishwarya Agarwal

IPC: G06F16/24 , G06F16/242 , G06F18/214 , G06F18/22 , G06F40/20 , G06N3/045 , G06N3/08 , G06V30/40

Abstract: In some embodiments, a multimodal computing system receives a query and identifies, from source documents, text passages and images that are relevant to the query. The multimodal computing system accesses a multimodal question-answering model that includes a textual stream of language models and a visual stream of language models. Each of the textual stream and the visual stream contains a set of transformer-based models and each transformer-based model includes a cross-attention layer using data generated by both the textual stream and visual stream of language models as an input. The multimodal computing system identifies text relevant to the query by applying the textual stream to the text passages and computes, using the visual stream, relevance scores of the images to the query, respectively. The multimodal computing system further generates a response to the query by including the text and/or an image according to the relevance scores.

2.

发明申请
MODALITY ADAPTIVE INFORMATION RETRIEVAL 有权

公开(公告)号：US20220230061A1

公开(公告)日：2022-07-21

申请号：US17153130

申请日：2021-01-20

Applicant: Adobe Inc.

Inventor： Hrituraj Singh , Jatin Lamba , Denil Pareshbhai Mehta , Balaji Vasan Srinivasan , Anshul Nasery , Aishwarya Agarwal

IPC: G06N3/08 , G06F16/242 , G06N3/04 , G06K9/62 , G06K9/00 , G06F40/20

Abstract: In some embodiments, a multimodal computing system receives a query and identifies, from source documents, text passages and images that are relevant to the query. The multimodal computing system accesses a multimodal question-answering model that includes a textual stream of language models and a visual stream of language models. Each of the textual stream and the visual stream contains a set of transformer-based models and each transformer-based model includes a cross-attention layer using data generated by both the textual stream and visual stream of language models as an input. The multimodal computing system identifies text relevant to the query by applying the textual stream to the text passages and computes, using the visual stream, relevance scores of the images to the query, respectively. The multimodal computing system further generates a response to the query by including the text and/or an image according to the relevance scores.

Patent Agency Ranking