DIFFUSION-BASED OPEN-VOCABULARY SEGMENTATION
    41.
    发明公开

    公开(公告)号:US20240153093A1

    公开(公告)日:2024-05-09

    申请号:US18310414

    申请日:2023-05-01

    CPC classification number: G06T7/10 G06V10/40 G06T2207/20081 G06T2207/20084

    Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.

    IMAGE SEGMENTATION USING A NEURAL NETWORK TRANSLATION MODEL

    公开(公告)号:US20220254029A1

    公开(公告)日:2022-08-11

    申请号:US17500338

    申请日:2021-10-13

    Abstract: The neural network includes an encoder, a common decoder, and a residual decoder. The encoder encodes input images into a latent space. The latent space disentangles unique features from other common features. The common decoder decodes common features resident in the latent space to generate translated images which lack the unique features. The residual decoder decodes unique features resident in the latent space to generate image deltas corresponding to the unique features. The neural network combines the translated images with the image deltas to generate combined images that may include both common features and unique features. The combined images can be used to drive autoencoding. Once training is complete, the residual decoder can be modified to generate segmentation masks that indicate any regions of a given input image where a unique feature resides.

    Few-shot viewpoint estimation
    47.
    发明授权

    公开(公告)号:US11375176B2

    公开(公告)日:2022-06-28

    申请号:US16780738

    申请日:2020-02-03

    Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.

Patent Agency Ranking