Learning dense correspondences for images

    公开(公告)号:US12169882B2

    公开(公告)日:2024-12-17

    申请号:US17929182

    申请日:2022-09-01

    Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

    DIFFUSION-BASED OPEN-VOCABULARY SEGMENTATION
    32.
    发明公开

    公开(公告)号:US20240153093A1

    公开(公告)日:2024-05-09

    申请号:US18310414

    申请日:2023-05-01

    CPC classification number: G06T7/10 G06V10/40 G06T2207/20081 G06T2207/20084

    Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.

    Few-shot viewpoint estimation
    35.
    发明授权

    公开(公告)号:US11375176B2

    公开(公告)日:2022-06-28

    申请号:US16780738

    申请日:2020-02-03

    Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.

    SELF-SUPERVISED HIERARCHICAL MOTION LEARNING FOR VIDEO ACTION RECOGNITION

    公开(公告)号:US20210064931A1

    公开(公告)日:2021-03-04

    申请号:US16998914

    申请日:2020-08-20

    Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.

Patent Agency Ranking