Patent search ap:("NVIDIA Corporation") AND inv:"Jiarui Xu" Page 1

1.

发明公开
DIFFUSION-BASED OPEN-VOCABULARY SEGMENTATION 审中-公开

公开(公告)号：US20240153093A1

公开(公告)日：2024-05-09

申请号：US18310414

申请日：2023-05-01

Applicant: NVIDIA Corporation

Inventor： Jiarui Xu , Shalini De Mello , Sifei Liu , Arash Vahdat , Wonmin Byeon

IPC: G06T7/10 , G06V10/40

CPC classification number: G06T7/10 , G06V10/40 , G06T2207/20081 , G06T2207/20084

Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.

2.

发明公开
PERFORMING SEMANTIC SEGMENTATION TRAINING WITH IMAGE/TEXT PAIRS 审中-公开

公开(公告)号：US20230177810A1

公开(公告)日：2023-06-08

申请号：US17853631

申请日：2022-06-29

Applicant: NVIDIA Corporation

Inventor： Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz

IPC: G06V10/774 , G06V10/26

CPC classification number: G06V10/774 , G06V10/26

Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification