-
公开(公告)号:US11645530B2
公开(公告)日:2023-05-09
申请号:US17325024
申请日:2021-05-19
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz
CPC classification number: G06N3/082 , G06F18/24 , G06N3/044 , G06N3/045 , G06N3/048 , G06V10/764 , G06V10/82 , G06V20/41
Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
-
公开(公告)号:US11631239B2
公开(公告)日:2023-04-18
申请号:US17237728
申请日:2021-04-22
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Ming-Yu Liu , Jan Kautz , Fanyi Xiao , Xitong Yang
Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
-
公开(公告)号:US20230070514A1
公开(公告)日:2023-03-09
申请号:US17584213
申请日:2022-01-25
Applicant: NVIDIA Corporation
Inventor: Ye Yuan , Umar Iqbal , Pavlo Molchanov , Jan Kautz
Abstract: In order to determine accurate three-dimensional (3D) models for objects within a video, the objects are first identified and tracked within the video, and a pose and shape are estimated for these tracked objects. A translation and global orientation are removed from the tracked objects to determine local motion for the objects, and motion infilling is performed to fill in any missing portions for the object within the video. A global trajectory is then determined for the objects within the video, and the infilled motion and global trajectory are then used to determine infilled global motion for the object within the video. This enables the accurate depiction of each object as a 3D pose sequence for that model that accounts for occlusions and global factors within the video.
-
公开(公告)号:US11593661B2
公开(公告)日:2023-02-28
申请号:US16389832
申请日:2019-04-19
Applicant: NVIDIA Corporation
Inventor: Seonwook Park , Shalini De Mello , Pavlo Molchanov , Umar Iqbal , Jan Kautz
Abstract: A neural network is trained to identify one or more features of an image. The neural network is trained using a small number of original images, from which a plurality of additional images are derived. The additional images generated by rotating and decoding embeddings of the image in a latent space generated by an autoencoder. The images generated by the rotation and decoding exhibit changes to a feature that is in proportion to the amount of rotation.
-
公开(公告)号:US20230035306A1
公开(公告)日:2023-02-02
申请号:US17382027
申请日:2021-07-21
Applicant: Nvidia Corporation
Inventor: Ming-Yu Liu , Koki Nagano , Yeongho Seol , Jose Rafael Valle Gomes da Costa , Jaewoo Seo , Ting-Chun Wang , Arun Mallya , Sameh Khamis , Wei Ping , Rohan Badlani , Kevin Jonathan Shih , Bryan Catanzaro , Simon Yuen , Jan Kautz
Abstract: Apparatuses, systems, and techniques are presented to generate media content. In at least one embodiment, a first neural network is used to generate first video information based, at least in part, upon voice information corresponding to one or more users, and a second neural network is used to generate second video information corresponding to the one or more users based, at least in part, upon the first video information and one or more images corresponding to the one or more users
-
公开(公告)号:US20230015989A1
公开(公告)日:2023-01-19
申请号:US17365877
申请日:2021-07-01
Applicant: Nvidia Corporation
Inventor: Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz
Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
-
公开(公告)号:US20230004760A1
公开(公告)日:2023-01-05
申请号:US17361202
申请日:2021-06-28
Applicant: NVIDIA Corporation
Inventor: Siva Karthik Mustikovela , Shalini De Mello , Aayush Prakash , Umar Iqbal , Sifei Liu , Jan Kautz
IPC: G06K9/62
Abstract: Apparatuses, systems, and techniques to identify objects within an image using self-supervised machine learning. In at least one embodiment, a machine learning system is trained to recognize objects by training a first network to recognize objects within images that are generated by a second network. In at least one embodiment, the second network is a controllable network.
-
公开(公告)号:US11508076B2
公开(公告)日:2022-11-22
申请号:US17156406
申请日:2021-01-22
Applicant: NVIDIA Corporation
Inventor: Zhaoyang Lv , Kihwan Kim , Deqing Sun , Alejandro Jose Troccoli , Jan Kautz
IPC: G06T7/254 , G06T7/90 , G06T7/50 , G06N3/08 , G06T7/194 , G06T3/00 , G06T7/70 , G06T7/60 , G06T7/11 , G06N5/04 , G06T7/285 , G06T7/215
Abstract: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.
-
公开(公告)号:US11488418B2
公开(公告)日:2022-11-01
申请号:US17135697
申请日:2020-12-28
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Thomas Michael Breuel , Jan Kautz
Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
-
公开(公告)号:US20220270318A1
公开(公告)日:2022-08-25
申请号:US17734244
申请日:2022-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
-
-
-
-
-
-
-
-