VARIATIONAL INFERENCING BY A DIFFUSION MODEL

    公开(公告)号:US20250045892A1

    公开(公告)日:2025-02-06

    申请号:US18593742

    申请日:2024-03-01

    Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. For example, they can be trained in the image domain, for example, to perform specific image restoration tasks, such as inpainting (e.g. completing an incomplete image), deblurring (e.g. removing blurring from an image), and super-resolution (e.g. increasing a resolution of an image), or they can be trained to perform image rendering tasks, including 2D-to-3D image generation tasks. However, current approaches to training diffusion models only allow the models to be optimized for a specific task such that they will not achieve high-quality results when used for other tasks. The present disclosure provides a diffusion model that uses variational inferencing to approximate a distribution of data, which allows the diffusion model to universally solve different tasks without having to be re-trained specifically for each task.

    SYNTHETIC DATA GENERATION USING MORPHABLE MODELS WITH IDENTITY AND EXPRESSION EMBEDDINGS

    公开(公告)号:US20240371096A1

    公开(公告)日:2024-11-07

    申请号:US18312102

    申请日:2023-05-04

    Abstract: Approaches presented herein provide systems and methods for disentangling identity from expression input models. One or more machine learning systems may be trained directly from three-dimensional (3D) points to develop unique latent codes for expressions associated with different identities. These codes may then be mapped to different identities to independently model an object, such as a face, to generate a new mesh including an expression for an independent identity. A pipeline may include a set of machine learning systems to determine model parameters and also adjust input expression codes using gradient backpropagation in order train models for incorporation into a content development pipeline.

    LEARNING DENSE CORRESPONDENCES FOR IMAGES
    107.
    发明公开

    公开(公告)号:US20230252692A1

    公开(公告)日:2023-08-10

    申请号:US17929182

    申请日:2022-09-01

    CPC classification number: G06T11/001 G06T3/0093

    Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

    PERFORMING SEMANTIC SEGMENTATION TRAINING WITH IMAGE/TEXT PAIRS

    公开(公告)号:US20230177810A1

    公开(公告)日:2023-06-08

    申请号:US17853631

    申请日:2022-06-29

    CPC classification number: G06V10/774 G06V10/26

    Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.

    FUTURE OBJECT TRAJECTORY PREDICTIONS FOR AUTONOMOUS MACHINE APPLICATIONS

    公开(公告)号:US20230088912A1

    公开(公告)日:2023-03-23

    申请号:US17952866

    申请日:2022-09-26

    Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.

    PRUNING A VISION TRANSFORMER
    110.
    发明申请

    公开(公告)号:US20230080247A1

    公开(公告)日:2023-03-16

    申请号:US17551005

    申请日:2021-12-14

    Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.

Patent Agency Ranking