Multi-granularity alignment for visual question answering

    公开(公告)号:US12210835B2

    公开(公告)日:2025-01-28

    申请号:US17946400

    申请日:2022-09-16

    Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.

Patent Agency Ranking