-
公开(公告)号:US11869483B2
公开(公告)日:2024-01-09
申请号:US17496636
申请日:2021-10-07
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/00 , G10L13/08 , G10L13/10 , G10L13/047 , G10L25/90 , G06N3/045 , G06N3/08 , G10L13/033
CPC classification number: G10L13/047 , G06N3/045 , G06N3/08 , G10L13/0335 , G10L13/08 , G10L25/90
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
公开(公告)号:US20230402028A1
公开(公告)日:2023-12-14
申请号:US18457221
申请日:2023-08-28
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/047 , G10L13/033 , G10L13/08 , G06N3/08 , G06N3/045 , G10L25/90
CPC classification number: G10L13/047 , G10L13/0335 , G10L13/08 , G06N3/08 , G06N3/045 , G10L25/90
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
公开(公告)号:US20220114700A1
公开(公告)日:2022-04-14
申请号:US17066282
申请日:2020-10-08
Applicant: Nvidia Corporation
Inventor: Shiqiu Liu , Robert Pottorff , Guilin Liu , Karan Sapra , Jon Barker , David Tarjan , Pekka Janis , Edvard Fagerholm , Lei Yang , Kevin Shih , Marco Salvi , Timo Roman , Andrew Tao , Bryan Catanzaro
Abstract: Apparatuses, systems, and techniques are presented to generate images. In at least one embodiment, one or more neural networks are used to generate one or more images using one or more pixel weights determined based, at least in part, on one or more sub-pixel offset values.
-
公开(公告)号:US20210064925A1
公开(公告)日:2021-03-04
申请号:US16558620
申请日:2019-09-03
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Aysegul Dundar , Animesh Garg , Robert Pottorff , Andrew Tao , Bryan Catanzaro
Abstract: Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
-
公开(公告)号:US20190297326A1
公开(公告)日:2019-09-26
申请号:US16360853
申请日:2019-03-21
Applicant: NVIDIA Corporation
Inventor: Fitsum A. Reda , Guilin Liu , Kevin Shih , Robert Kirby , Jonathan Barker , David Tarjan , Andrew Tao , Bryan Catanzaro
IPC: H04N19/139 , G06N3/08 , G06N20/10 , G06N3/04 , G06N20/20 , H04N19/587 , H04N19/132 , H04N19/172
Abstract: A neural network architecture is disclosed for performing video frame prediction using a sequence of video frames and corresponding pairwise optical flows. The neural network processes the sequence of video frames and optical flows utilizing three-dimensional convolution operations, where time (or multiple video frames in the sequence of video frames) provides the third dimension in addition to the two-dimensional pixel space of the video frames. The neural network generates a set of parameters used to predict a next video frame in the sequence of video frames by sampling a previous video frame utilizing spatially-displaced convolution operations. In one embodiment, the set of parameters includes a displacement vector and at least one convolution kernel per pixel. Generating a pixel value in the next video frame includes applying the convolution kernel to a corresponding patch of pixels in the previous video frame based on the displacement vector.
-
-
-
-