-
公开(公告)号:US11368652B1
公开(公告)日:2022-06-21
申请号:US17084347
申请日:2020-10-29
Applicant: Amazon Technologies, Inc.
Inventor: Gregory Johnson , Pragyana K. Mishra , Mohammed Khalilia , Wenbin Ouyang , Naveen Sudhakaran Nair
Abstract: Audio content and played frames may be received. The audio content may correspond to first video content. The played frames may be included in the first video content. The first video content may further include a replaced frame. The played frames and the replaced frame may include a face of a person. Location data may also be received that indicates locations of facial features of the face of the person within the replaced frame. A replacement frame may be generated, such as by rendering the facial features in the replacement frame based at least in part on the locations indicated by the location data and positions indicated by a portion of the audio content that is associated with the replaced frame. Second video content may be played including the played frames and the replacement frame. The replacement frame may replace the replaced frame in the second video content.
-
公开(公告)号:US12149757B1
公开(公告)日:2024-11-19
申请号:US18216164
申请日:2023-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Wenbin Ouyang , Naveen Sudhakaran Nair , Baris Gecer , Ali Abdool
IPC: H04N21/234 , H04N21/235
Abstract: A computer-implemented method is disclosed. The method includes selecting one or more target surfaces portrayed in at least one video frame, generating a video data latent space representation of the at least one video frame, accessing a plurality of supplemental data latent space representations of a plurality of supplemental data sets, identifying a particular supplemental data latent space representation based at least in part on the video data latent space representation, selecting a particular supplemental data set in response to identifying the particular supplemental data latent space representation, the particular supplemental data set corresponding with the particular supplemental data latent space representation, and inserting the particular supplemental data set into the at least one video frame.
-
公开(公告)号:US12087268B1
公开(公告)日:2024-09-10
申请号:US17541996
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Wenbin Ouyang , Naveen Sudhakaran Nair
IPC: G10L13/02 , G06N3/08 , G10L17/18 , G10L21/013 , G10L21/10
CPC classification number: G10L13/02 , G06N3/08 , G10L17/18 , G10L21/013 , G10L21/10
Abstract: Systems, devices, and methods are provided for training and/or inferencing using machine-learning models. In at least one embodiment, a user selects a source media (e.g., video or audio file) and a target identity. A content embedding may be extracted from the source media, and an identity embedding may be obtained for the target identity. The content embedding of the source media and the identity embedding of the target identity may be provided to a transfer model that generates synthesized media. For example, a user may select a song that is sung by a first artist and then select a second artist as the target identity to produce a cover of the song in the voice of the second artist.
-
-