-
公开(公告)号:US20190295228A1
公开(公告)日:2019-09-26
申请号:US16360895
申请日:2019-03-21
Applicant: NVIDIA Corporation
Inventor: Guilin Liu , Fitsum A. Reda , Kevin Shih , Ting-Chun Wang , Andrew Tao , Bryan Catanzaro
Abstract: A neural network architecture is disclosed for performing image in-painting using partial convolution operations. The neural network processes an image and a corresponding mask that identifies holes in the image utilizing partial convolution operations, where the mask is used by the partial convolution operation to zero out coefficients of the convolution kernel corresponding to invalid pixel data for the holes. The mask is updated after each partial convolution operation is performed in an encoder section of the neural network. In one embodiment, the neural network is implemented using an encoder-decoder framework with skip links to forward representations of the features at different sections of the encoder to corresponding sections of the decoder.
-
公开(公告)号:US20240038212A1
公开(公告)日:2024-02-01
申请号:US18099840
申请日:2023-01-20
Applicant: NVIDIA Corporation
Inventor: Kevin Shih , José Rafael Valle Gomes da Costa , Rohan Badlani , João Felipe Santos , Bryan Catanzaro
IPC: G10L13/027 , G10L13/08 , G10L25/30
CPC classification number: G10L13/027 , G10L13/08 , G10L25/30
Abstract: Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing generative text-to-speech models. The techniques include identifying a mapping of speech characteristics (SC) on a target distribution of a latent variable using a non-linear transformation for at least a subset of the SC. Parameters of the non-linear transformation are determined using a neural network that approximates a statistics of the SC with a statistics predicted for the SC based on the identified mapping and the target distribution of the latent variable.
-
公开(公告)号:US20210067735A1
公开(公告)日:2021-03-04
申请号:US16559312
申请日:2019-09-03
Applicant: Nvidia Corporation
Inventor: Fitsum Reda , Deqing Sun , Aysegul Dundar , Mohammad Shoeybi , Guilin Liu , Kevin Shih , Andrew Tao , Jan Kautz , Bryan Catanzaro
Abstract: Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.
-
公开(公告)号:US20230419947A1
公开(公告)日:2023-12-28
申请号:US18449969
申请日:2023-08-15
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/047 , G10L25/90 , G06N3/045 , G06N3/08 , G10L13/033 , G10L13/08
CPC classification number: G10L13/047 , G10L25/90 , G10L13/08 , G06N3/08 , G10L13/0335 , G06N3/045
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
公开(公告)号:US11769481B2
公开(公告)日:2023-09-26
申请号:US17496569
申请日:2021-10-07
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/00 , G10L13/10 , G10L13/06 , G10L13/07 , G10L13/047 , G10L25/90 , G06N3/045 , G06N3/08 , G10L13/033 , G10L13/08
CPC classification number: G10L13/047 , G06N3/045 , G06N3/08 , G10L13/0335 , G10L13/08 , G10L25/90
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
公开(公告)号:US11902705B2
公开(公告)日:2024-02-13
申请号:US16558620
申请日:2019-09-03
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Aysegul Dundar , Animesh Garg , Robert Pottorff , Andrew Tao , Bryan Catanzaro
CPC classification number: H04N7/0135 , G06F18/214 , G06F18/217 , G06N3/044 , G06N3/045 , G06N3/08
Abstract: Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
-
公开(公告)号:US20230113950A1
公开(公告)日:2023-04-13
申请号:US17496569
申请日:2021-10-07
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/047 , G10L25/90
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
公开(公告)号:US20230110905A1
公开(公告)日:2023-04-13
申请号:US17496636
申请日:2021-10-07
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/08 , G10L13/047 , G10L13/033 , G06N3/08 , G06N3/04
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
公开(公告)号:US20220114701A1
公开(公告)日:2022-04-14
申请号:US17172330
申请日:2021-02-10
Applicant: Nvidia Corporation
Inventor: Shiqiu Liu , Robert Pottorff , Guilin Liu , Karan Sapra , Jon Barker , David Tarjan , Pekka Janis , Edvard Fagerholm , Lei Yang , Kevin Shih , Marco Salvi , Timo Roman , Andrew Tao , Bryan Catanzaro
Abstract: Apparatuses, systems, and techniques are presented to generate images. In at least one embodiment, one or more neural networks are used to generate one or more images using one or more pixel weights determined based, at least in part, on one or more sub-pixel offset values.
-
公开(公告)号:US11869483B2
公开(公告)日:2024-01-09
申请号:US17496636
申请日:2021-10-07
Applicant: Nvidia Corporation
Inventor: Kevin Shih , Jose Rafael Valle Gomes da Costa , Rohan Badlani , Adrian Lancucki , Wei Ping , Bryan Catanzaro
IPC: G10L13/00 , G10L13/08 , G10L13/10 , G10L13/047 , G10L25/90 , G06N3/045 , G06N3/08 , G10L13/033
CPC classification number: G10L13/047 , G06N3/045 , G06N3/08 , G10L13/0335 , G10L13/08 , G10L25/90
Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
-
-
-
-
-
-
-
-
-