-
公开(公告)号:US12149757B1
公开(公告)日:2024-11-19
申请号:US18216164
申请日:2023-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Wenbin Ouyang , Naveen Sudhakaran Nair , Baris Gecer , Ali Abdool
IPC: H04N21/234 , H04N21/235
Abstract: A computer-implemented method is disclosed. The method includes selecting one or more target surfaces portrayed in at least one video frame, generating a video data latent space representation of the at least one video frame, accessing a plurality of supplemental data latent space representations of a plurality of supplemental data sets, identifying a particular supplemental data latent space representation based at least in part on the video data latent space representation, selecting a particular supplemental data set in response to identifying the particular supplemental data latent space representation, the particular supplemental data set corresponding with the particular supplemental data latent space representation, and inserting the particular supplemental data set into the at least one video frame.
-
公开(公告)号:US12272383B1
公开(公告)日:2025-04-08
申请号:US18412549
申请日:2024-01-14
Applicant: Amazon Technologies, Inc.
Inventor: Rohun Tripathi , Angshuman Saha , Naveen Sudhakaran Nair
IPC: G11B27/031 , G06F40/58 , G06V10/774 , G06V10/82 , G06V20/40 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/18 , G10L21/013 , G10L25/57
Abstract: Systems and techniques for validation and generation of localized content for audio and video are described herein. The systems and techniques provide for training of twin neural networks to evaluate performance characteristics, sometimes referred to as content-auxiliary characteristics, of a localized performance. The localized performance may be validated or improved by identifying misalignment in the performance characteristics to ensure that localized content preserves content as well as creative intent and performance ability in the final product. The machine learning models trained using the techniques described herein may be used in connection with auto-localization processes to automatically generate high quality localized audio and video content.
-
公开(公告)号:US11875822B1
公开(公告)日:2024-01-16
申请号:US17748990
申请日:2022-05-19
Applicant: Amazon Technologies, Inc.
Inventor: Rohun Tripathi , Angshuman Saha , Naveen Sudhakaran Nair
IPC: G11B27/031 , G06V20/40 , G06V10/82 , G06V10/774 , G10L25/57 , G10L17/02 , G06F40/58 , G10L17/18 , G10L17/04 , G10L21/013 , G10L17/06
CPC classification number: G11B27/031 , G06F40/58 , G06V10/774 , G06V10/82 , G06V20/41 , G10L17/02 , G10L17/04 , G10L17/06 , G10L17/18 , G10L21/013 , G10L25/57 , G10L2021/0135
Abstract: Systems and techniques for validation and generation of localized content for audio and video are described herein. The systems and techniques provide for training of twin neural networks to evaluate performance characteristics, sometimes referred to as content-auxiliary characteristics, of a localized performance. The localized performance may be validated or improved by identifying misalignment in the performance characteristics to ensure that localized content preserves content as well as creative intent and performance ability in the final product. The machine learning models trained using the techniques described herein may be used in connection with auto-localization processes to automatically generate high quality localized audio and video content.
-
公开(公告)号:US11368652B1
公开(公告)日:2022-06-21
申请号:US17084347
申请日:2020-10-29
Applicant: Amazon Technologies, Inc.
Inventor: Gregory Johnson , Pragyana K. Mishra , Mohammed Khalilia , Wenbin Ouyang , Naveen Sudhakaran Nair
Abstract: Audio content and played frames may be received. The audio content may correspond to first video content. The played frames may be included in the first video content. The first video content may further include a replaced frame. The played frames and the replaced frame may include a face of a person. Location data may also be received that indicates locations of facial features of the face of the person within the replaced frame. A replacement frame may be generated, such as by rendering the facial features in the replacement frame based at least in part on the locations indicated by the location data and positions indicated by a portion of the audio content that is associated with the replaced frame. Second video content may be played including the played frames and the replacement frame. The replacement frame may replace the replaced frame in the second video content.
-
公开(公告)号:US12087268B1
公开(公告)日:2024-09-10
申请号:US17541996
申请日:2021-12-03
Applicant: Amazon Technologies, Inc.
Inventor: Wenbin Ouyang , Naveen Sudhakaran Nair
IPC: G10L13/02 , G06N3/08 , G10L17/18 , G10L21/013 , G10L21/10
CPC classification number: G10L13/02 , G06N3/08 , G10L17/18 , G10L21/013 , G10L21/10
Abstract: Systems, devices, and methods are provided for training and/or inferencing using machine-learning models. In at least one embodiment, a user selects a source media (e.g., video or audio file) and a target identity. A content embedding may be extracted from the source media, and an identity embedding may be obtained for the target identity. The content embedding of the source media and the identity embedding of the target identity may be provided to a transfer model that generates synthesized media. For example, a user may select a song that is sung by a first artist and then select a second artist as the target identity to produce a cover of the song in the voice of the second artist.
-
公开(公告)号:US11531887B1
公开(公告)日:2022-12-20
申请号:US16750381
申请日:2020-01-23
Applicant: Amazon Technologies, Inc.
Inventor: Naveen Sudhakaran Nair , Pragyana K. Mishra
Abstract: Prediction of outcomes of disruptive treatments are enabled utilizing sequenced training of a machine learning model over ordered bins of treatment candidates. Treatment candidates may be assigned to candidate characterization bins with an ordering, and the model may be trained with a sequence of training steps corresponding to the ordering of the candidate characterization bins, in each training step the model having untreated candidate features from a corresponding bin and aggregate metrics from one or more previous steps as input. The predicted outcome for a selected bin may be generated with the trained model having treated candidate features and aggregate metrics from one or more previous steps as input. The predicted outcome may be a counterfactual prediction for a bin with insufficient control candidates, and may represent a nonlinear extrapolation from control data in prior bins in the bin ordering.
-
公开(公告)号:US10740778B1
公开(公告)日:2020-08-11
申请号:US15707768
申请日:2017-09-18
Applicant: Amazon Technologies, Inc.
Inventor: Naveen Sudhakaran Nair , Pragyana K. Mishra , Chittaranjan Tripathy
Abstract: A content provider may cause a client device of a user to output a personalized puzzle in response to receiving a request from the client device to access electronic content of the content provider. The puzzle may include a theme that corresponds to a determined predilection of the user, and/or the puzzle may be a type of puzzle that corresponds to the user's predilection. The client device may also output, with the puzzle, an incentive for completing (e.g., solving) the puzzle Upon receiving data indicating that the user has completed his/her personalized puzzle, the content provider may provide the reward to the user.
-
公开(公告)号:US12003825B1
公开(公告)日:2024-06-04
申请号:US17949822
申请日:2022-09-21
Applicant: Amazon Technologies, Inc.
Inventor: Naveen Sudhakaran Nair
IPC: H04N21/488
CPC classification number: H04N21/4884
Abstract: Devices, systems, and methods are provided for presenting on-screen text during video playback. A method may include detecting a user request to determine when to activate and deactivate presentation of on-screen text during playback of a video; inputting, to a machine learning model, text data of video titles, audio data of the video titles, video frames of the video titles, and user data associated with users of a streaming video application; generating, using the machine learning model, based on the text data, the audio data, the video frames, and the user data, the first times and the second times; sending a bitstream comprising streaming video and indications of the first times and the second times; activating, based on the first times, presentation of the on-screen text during presentation of the streaming video; and deactivating, based on the second times, presentation of the on-screen text during presentation of the streaming video.
-
公开(公告)号:US11514948B1
公开(公告)日:2022-11-29
申请号:US16738951
申请日:2020-01-09
Applicant: Amazon Technologies, Inc.
Inventor: Naveen Sudhakaran Nair , Pragyana K. Mishra
IPC: G11B27/036 , G06F16/23 , G06F16/783 , H04N7/15 , H04N21/2343 , G06N3/08 , G10L17/00 , G06N3/04
Abstract: Model-based dubbing techniques are implemented to generate a translated version of a source video. Spoken audio portions of a source video may be extracted and semantic graphs generated that represent the spoken audio portions. The semantic graphs may be used to produce translations of the spoken portions. A machine learning model may be implemented to generate replacement audio for the spoken portions using the translation of the spoken portion. A machine learning model may be implemented to generate modifications to facial image data for a speaker of the replacement audio.
-
公开(公告)号:US11295229B1
公开(公告)日:2022-04-05
申请号:US15132959
申请日:2016-04-19
Applicant: Amazon Technologies, Inc.
Inventor: Pooja Ashok Kumar , Naveen Sudhakaran Nair , Rajeev Ramnarain Rastogi
Abstract: An approximate count of a subset of records of a data set is obtained using one or more transformation functions. The subset comprises records which contain a first value of one input variable, a second value of another input variable, and a particular value of a target variable. Using the approximate count, an approximate correlation metric for a multidimensional feature and the target variable is obtained. Based on the correlation metric, the multidimensional feature is included in a candidate feature set to be used to train a machine learning model.
-
-
-
-
-
-
-
-
-