-
公开(公告)号:US20240095987A1
公开(公告)日:2024-03-21
申请号:US18081076
申请日:2022-12-14
Applicant: Amazon Technologies, Inc.
Inventor: Robinson Piramuthu , Sanqiang Zhao , Yadunandana Rao , Zhiyuan Fang
IPC: G06T13/00 , G06F40/166 , G06F40/279 , G06F40/40 , G06T11/00 , G10L13/02 , G10L15/18 , G10L15/22
CPC classification number: G06T13/00 , G06F40/166 , G06F40/279 , G06F40/40 , G06T11/00 , G10L13/02 , G10L15/18 , G10L15/22
Abstract: Techniques for generating content associated with a user input/system generated response are described. Natural language data associated with a user input may be generated. For each portion of the natural language data, ambiguous references to entities in the portion may be replaced with the corresponding entity. Entities included in the portion may be extracted, and image data representing the entity may be determined. Background image data associated with the entities and the portion may be determined, and attributes which modify the entities in the natural language sentence may be extracted. Spatial relationships between two or more of the entities may further be extracted. Image data representing the natural language data may be generated based on the background image data, the entities, the attributes, and the spatial relationships. Video data may be generated based on the image data, where the video data includes animations of the entities moving.
-
公开(公告)号:US12205577B1
公开(公告)日:2025-01-21
申请号:US17217031
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Taehwan Kim , Sanqiang Zhao , Robinson Piramuthu , Seokhwan Kim , Yang Liu , Gokhan Tur , Eshan Bhatnagar
Abstract: Techniques for rendering visual content, in response to one or more utterances, are described. A device receives one or more utterances that define a parameter(s) for desired output content. A system (or the device) identifies natural language data corresponding to the desired content, and uses natural language generation processes to update the natural language data based on the parameter(s). The system (or the device) then generates an image based on the updated natural language data. The system (or the device) also generates video data of an avatar. The device displays the image and the avatar, and synchronizes movements of the avatar with output of synthesized speech of the updated natural language data. The device may also display subtitles of the updated natural language data, and cause a word of the subtitles to be emphasized when synthesized speech of the word is being output.
-
公开(公告)号:US12117838B1
公开(公告)日:2024-10-15
申请号:US17218621
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Gunnar Atli Sigurdsson , Robinson Piramuthu , Gokhan Tur
CPC classification number: G05D1/0219 , G05D1/0088 , G05D1/0251 , G05D1/0274 , G06T7/73 , G10L13/08 , G10L15/1807 , G10L15/22 , G10L2015/223
Abstract: Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.
-
公开(公告)号:US20250028321A1
公开(公告)日:2025-01-23
申请号:US18907880
申请日:2024-10-07
Applicant: Amazon Technologies, Inc.
Inventor: Gunnar Atli Sigurdsson , Robinson Piramuthu , Gokhan Tur
Abstract: Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.
-
-
-