-
公开(公告)号:US12210835B2
公开(公告)日:2025-01-28
申请号:US17946400
申请日:2022-09-16
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peixi Xiong , Yilin Shen , Hongxia Jin
IPC: G06F40/30 , G06F40/284 , G06F40/289 , G06V10/40 , G06V10/70 , G06V10/80 , G06F16/33 , G06V10/426 , G06V10/762 , G06V10/764 , G06V10/82
Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.