Inverse rendering of a scene from a single image

    公开(公告)号:US11295514B2

    公开(公告)日:2022-04-05

    申请号:US16685538

    申请日:2019-11-15

    Abstract: Inverse rendering estimates physical scene attributes (e.g., reflectance, geometry, and lighting) from image(s) and is used for gaming, virtual reality, augmented reality, and robotics. An inverse rendering network (IRN) receives a single input image of a 3D scene and generates the physical scene attributes for the image. The IRN is trained by using the estimated physical scene attributes generated by the IRN to reproduce the input image and updating parameters of the IRN to reduce differences between the reproduced input image and the input image. A direct renderer and a residual appearance renderer (RAR) reproduce the input image. The RAR predicts a residual image representing complex appearance effects of the real (not synthetic) image based on features extracted from the image and the reflectance and geometry properties. The residual image represents near-field illumination, cast shadows, inter-reflections, and realistic shading that are not provided by the direct renderer.

    Transforming convolutional neural networks for visual sequence learning

    公开(公告)号:US11049018B2

    公开(公告)日:2021-06-29

    申请号:US15880472

    申请日:2018-01-25

    Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

    THREE-DIMENSIONAL (3D) POSE ESTIMATION FROM A MONOCULAR CAMERA

    公开(公告)号:US20210117661A1

    公开(公告)日:2021-04-22

    申请号:US17135697

    申请日:2020-12-28

    Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.

    Learning-based camera pose estimation from images of an environment

    公开(公告)号:US10964061B2

    公开(公告)日:2021-03-30

    申请号:US16872752

    申请日:2020-05-12

    Abstract: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.

    DUAL RECURRENT NEURAL NETWORK ARCHITECTURE FOR MODELING LONG-TERM DEPENDENCIES IN SEQUENTIAL DATA

    公开(公告)号:US20210089867A1

    公开(公告)日:2021-03-25

    申请号:US16581099

    申请日:2019-09-24

    Abstract: Learning the dynamics of an environment and predicting consequences in the future is a recent technical advancement that can be applied to video prediction, speech recognition, among other applications. Generally, machine learning, such as deep learning models, neural networks, or other artificial intelligence algorithms are used to make the predictions. However, current artificial intelligence algorithms used for making predictions are typically limited to making short-term future predictions, mainly as a result of 1) the presence of complex dynamics in high-dimensional video data, 2) prediction error propagation over time, and 3) inherent uncertainty of the future. The present disclosure enables the modeling of long-term dependencies in sequential data for use in making long-term predictions by providing a dual (i.e. two-part) recurrent neural network architecture.

    Equivariant landmark transformation for landmark localization

    公开(公告)号:US10783394B2

    公开(公告)日:2020-09-22

    申请号:US16006728

    申请日:2018-06-12

    Abstract: A method, computer readable medium, and system are disclosed to generate coordinates of landmarks within images. The landmark locations may be identified on an image of a human face and used for emotion recognition, face identity verification, eye gaze tracking, pose estimation, etc. A transform is applied to input image data to produce transformed input image data. The transform is also applied to predicted coordinates for landmarks of the input image data to produce transformed predicted coordinates. A neural network model processes the transformed input image data to generate additional landmarks of the transformed input image data and additional predicted coordinates for each one of the additional landmarks. Parameters of the neural network model are updated to reduce differences between the transformed predicted coordinates and the additional predicted coordinates.

Patent Agency Ranking