Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Weitong Ruan"

1.

发明授权
Natural language selection of objects in image data 有权

公开(公告)号：US12045288B1

公开(公告)日：2024-07-23

申请号：US17031062

申请日：2020-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Ahmet Emre Barut , Chengwei Su , Weitong Ruan , Wael Hamza

IPC: G06F16/30 , G06F16/532 , G06F16/583 , G06F16/9032 , G06V20/20 , G06N20/00

CPC classification number: G06F16/90332 , G06F16/532 , G06F16/583 , G06V20/20 , G06N20/00

Abstract: Devices and techniques are generally described for selection of objects in image data using natural language input. In various examples, first image data representing at least a first object and first natural language data may be received. In some examples, first embedding data representing the first natural language data may be generated. Second embedding data representing the first image data may be generated. Relative location data indicating a location of the first object in the first image data relative to at least one other object may be generated. The first embedding data, the second embedding data, and the relative location data may be input into a multi-modal transformer model. The multi-modal transformer model may determine that the first natural language data relates to the first object.

Patent Agency Ranking