Invention Application
- Patent Title: Interpretable Counting in Visual Question Answering
-
Application No.: US16781179Application Date: 2020-02-04
-
Publication No.: US20200175305A1Publication Date: 2020-06-04
- Inventor: Alexander Richard TROTT , Caiming XIONG , Richard SOCHER
- Applicant: salesforce.com, inc.
- Main IPC: G06K9/46
- IPC: G06K9/46 ; G06N3/04 ; G06K9/00 ; G06N5/04 ; G06F16/332

Abstract:
Approaches for interpretable counting for visual question answering include a digital image processor, a language processor, and a counter. The digital image processor identifies objects in an image, maps the identified objects into an embedding space, generates bounding boxes for each of the identified objects, and outputs the embedded objects paired with their bounding boxes. The language processor embeds a question into the embedding space. The scorer determines scores for the identified objects. Each respective score determines how well a corresponding one of the identified objects is responsive to the question. The counter determines a count of the objects in the digital image that are responsive to the question based on the scores. The count and a corresponding bounding box for each object included in the count are output. In some embodiments, the counter determines the count interactively based on interactions between counted and uncounted objects.
Public/Granted literature
- US11270145B2 Interpretable counting in visual question answering Public/Granted day:2022-03-08
Information query