-
公开(公告)号:US20250028321A1
公开(公告)日:2025-01-23
申请号:US18907880
申请日:2024-10-07
Applicant: Amazon Technologies, Inc.
Inventor: Gunnar Atli Sigurdsson , Robinson Piramuthu , Gokhan Tur
Abstract: Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.
-
公开(公告)号:US12117838B1
公开(公告)日:2024-10-15
申请号:US17218621
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Gunnar Atli Sigurdsson , Robinson Piramuthu , Gokhan Tur
CPC classification number: G05D1/0219 , G05D1/0088 , G05D1/0251 , G05D1/0274 , G06T7/73 , G10L13/08 , G10L15/1807 , G10L15/22 , G10L2015/223
Abstract: Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.
-