Abstract:
Apparatuses, systems, and techniques are presented to modify media content using inferred attention. In at least one embodiment, a network is trained to predict a gaze of one or more users on one or more image features based, at least in part, on one or more prior gazes of the one or more users, wherein the prediction is to be used to modify at least one of the one or more image features.
Abstract:
When a computer image is generated from a real-world scene having a semi-reflective surface (e.g. window), the computer image will create, at the semi-reflective surface from the viewpoint of the camera, both a reflection of a scene in front of the semi-reflective surface and a transmission of a scene located behind the semi-reflective surface. Similar to a person viewing the real-world scene from different locations, angles, etc., the reflection and transmission may change, and also move relative to each other, as the viewpoint of the camera changes. Unfortunately, the dynamic nature of the reflection and transmission negatively impacts the performance of many computer applications, but performance can generally be improved if the reflection and transmission are separated. The present disclosure uses deep learning to separate reflection and transmission at a semi-reflective surface of a computer image generated from a real-world scene.
Abstract:
A system, computer-readable medium, and method are provided for generating images based on adaptations of the human visual system. An input image is received, an effect provoking change is received, and an afterimage resulting from a cumulative effect of human visual adaptation is computed based on the effect provoking change and a per-photoreceptor type physiological adaptation of the human visual system. The computed afterimage may include a bleaching afterimage effect and/or a local adaptation afterimage effect. The computed afterimage is then accumulated into an output image for display.
Abstract:
A target image corresponding to a novel view may be synthesized from two source images, corresponding source camera poses, and pixel attribute correspondences between the two source images. A particular object in the target image need only be visible in one of the two source images for successful synthesis. Each pixel in the target image is defined according to an identified pixel in one of the two source images. The identified source pixel provides attributes such as color, texture, and feature descriptors for the target pixel. The source and target camera poses are used to define geometric relationships for identifying the source pixels. In an embodiment, the pixel attribute correspondences are optical flow that defines movement of attributes from a first image of the two source images to a second image of the two source images.
Abstract:
Apparatuses, systems, and techniques to identify objects in view of a camera associated with a vehicle. In at least one embodiment, objects with which a vehicle may collide are identified, based on, for example, a difference between a size of an image of the objects detected at a first point in time and a size of an image of the objects detected at a subsequent point in time.
Abstract:
When a computer image is generated from a real-world scene having a semi-reflective surface (e.g. window), the computer image will create, at the semi-reflective surface from the viewpoint of the camera, both a reflection of a scene in front of the semi-reflective surface and a transmission of a scene located behind the semi-reflective surface. Similar to a person viewing the real-world scene from different locations, angles, etc., the reflection and transmission may change, and also move relative to each other, as the viewpoint of the camera changes. Unfortunately, the dynamic nature of the reflection and transmission negatively impacts the performance of many computer applications, but performance can generally be improved if the reflection and transmission are separated. The present disclosure uses deep learning to separate reflection and transmission at a semi-reflective surface of a computer image generated from a real-world scene.
Abstract:
Missing image content is generated using a neural network. In an embodiment, a high resolution image and associated high resolution semantic label map are generated from a low resolution image and associated low resolution semantic label map. The input image/map pair (low resolution image and associated low resolution semantic label map) lacks detail and is therefore missing content. Rather than simply enhancing the input image/map pair, data missing in the input image/map pair is improvised or hallucinated by a neural network, creating plausible content while maintaining spatio-temporal consistency. Missing content is hallucinated to generate a detailed zoomed in portion of an image. Missing content is hallucinated to generate different variations of an image, such as different seasons or weather conditions for a driving video.
Abstract:
A computer implemented method of determining a latent image from an observed image is disclosed. The method comprises implementing a plurality of image processing operations within a single optimization framework, wherein the single optimization framework comprises solving a linear minimization expression. The method further comprises mapping the linear minimization expression onto at least one non-linear solver. Further, the method comprises using the non-linear solver, iteratively solving the linear minimization expression in order to extract the latent image from the observed image, wherein the linear minimization expression comprises: a data term, and a regularization term, and wherein the regularization term comprises a plurality of non-linear image priors.
Abstract:
A system, method, and computer program product are provided for performing fast, non-rigid registration for at least two images of a high-dynamic range image stack. The method includes the steps of generating a warped image based on a set of corresponding pixels, analyzing the warped image to detect unreliable pixels in the warped image, and generating a corrected pixel value for each unreliable pixel in the warped image. The set of corresponding pixels includes a plurality of pixels in a source image, each pixel in the plurality of pixels associated with a potential feature in the source image and paired with a corresponding pixel in a reference image that substantially matches the pixel in the source image.
Abstract:
One embodiment of a method includes predicting one or more three-dimensional (3D) mesh representations based on a plurality of digital images, wherein the one or more 3D mesh representations are refined by minimizing at least one difference between the one or more 3D mesh representations and the plurality of digital images.