-
公开(公告)号:US20220114702A1
公开(公告)日:2022-04-14
申请号:US17406902
申请日:2021-08-19
Applicant: Nvidia Corporation
Inventor: Shiqiu Liu , Robert Pottorff , Guilin Liu , Karan Sapra , Jon Barker , David Tarjan , Pekka Janis , Edvard Fagerholm , Lei Yang , Kevin Jonathan Shih , Marco Salvi , Timo Roman , Andrew Tao , Bryan Catanzaro
Abstract: Apparatuses, systems, and techniques are presented to generate images. In at least one embodiment, one or more neural networks are used to generate one or more images using one or more pixel weights.
-
公开(公告)号:US20220222778A1
公开(公告)日:2022-07-14
申请号:US17710643
申请日:2022-03-31
Applicant: NVIDIA Corporation
Inventor: Shiqiu Liu , Robert Thomas Pottorff , Guilin Liu , Karan Sapra , Jon Barker , David Tarjan , Pekka Janis , Edvard Olav Valter Fagerholm , Lei Yang , Kevin Jonathan Shih , Marco Salvi , Timo Roman , Andrew Tao , Bryan Christopher Catanzaro
Abstract: Apparatuses, systems, and techniques are presented to generate images. In at least one embodiment, one or more neural networks are used to generate one or more images using one or more pixel weights determined based, at least in part, on one or more sub-pixel offset values.
-
公开(公告)号:US20250118286A1
公开(公告)日:2025-04-10
申请号:US18483342
申请日:2023-10-09
Applicant: NVIDIA Corporation
IPC: G10L13/047 , G10L13/08 , G10L13/10 , G10L17/02 , G10L25/18
Abstract: In various examples, synthesizing speech in multiple languages in conversational AI systems and applications is described herein. Systems and methods are disclosed that use one or more models to synthesize speech from a first language spoken by a speaker to a second, target language selected by the speaker. In some examples, to perform the translation, the model(s) may disentangle one or more attributes associated with speech from speakers, such as speakers' identities, speakers' accents, and text associated with the speech. Additionally, the model(s) may allow for fine-grained control of additional attributes associated with output speech, such as one or more frequencies, one or more energies, and one or more phoneme durations. Furthermore, the model(s) may be configured to use the accent associated with the target language when generating text, such as when aligning text encodings with one or more phonemes.
-
公开(公告)号:US20230035306A1
公开(公告)日:2023-02-02
申请号:US17382027
申请日:2021-07-21
Applicant: Nvidia Corporation
Inventor: Ming-Yu Liu , Koki Nagano , Yeongho Seol , Jose Rafael Valle Gomes da Costa , Jaewoo Seo , Ting-Chun Wang , Arun Mallya , Sameh Khamis , Wei Ping , Rohan Badlani , Kevin Jonathan Shih , Bryan Catanzaro , Simon Yuen , Jan Kautz
Abstract: Apparatuses, systems, and techniques are presented to generate media content. In at least one embodiment, a first neural network is used to generate first video information based, at least in part, upon voice information corresponding to one or more users, and a second neural network is used to generate second video information corresponding to the one or more users based, at least in part, upon the first video information and one or more images corresponding to the one or more users
-
公开(公告)号:US20220180528A1
公开(公告)日:2022-06-09
申请号:US17678666
申请日:2022-02-23
Applicant: NVIDIA Corporation
Inventor: Aysegul Dundar , Kevin Jonathan Shih , Animesh Garg , Robert Thomas Pottorff , Andrew Tao , Bryan Christopher Catanzaro
Abstract: Apparatuses, systems, and techniques to perform unsupervised keypoint or landmark learning using one or more neural networks. In at least one embodiment, one or more neural networks use pose and appearance information to construct a foreground and a background, which are then used to reconstruct an input image and determine loss values to train the one or more neural networks.
-
-
-
-