-
公开(公告)号:US20230368499A1
公开(公告)日:2023-11-16
申请号:US18318159
申请日:2023-05-16
Inventor: Young Wan LEE , Jong Hee KIM , Jin Young MOON , Kang Min BAE , Yu Seok BAE , Je Seok HAM
CPC classification number: G06V10/7715 , G06V10/82 , G06V10/761 , G06V10/42
Abstract: The disclosure relates to a method of extracting image features based on a vision transformer, a method of performing embedding on an input image in units of patches and extracting visual features through global attention. An apparatus for extracting an image feature based on a vision transformer according to an embodiment of the disclosure includes a memory configured to store data and a processor configured to control the memory, wherein the processor is configured to perform embedding on multi-patches for an input image, extract feature maps for the embedding multi-patches, perform transformer encoding based on a neural network using the extracted feature maps, extract a feature of the input image through a final feature map extracted through the transformer encoding, and wherein the patches have different sizes.