LEVERAGING SEMANTIC INFORMATION FOR A MULTI-DOMAIN VISUAL AGENT

    公开(公告)号:US20250148766A1

    公开(公告)日:2025-05-08

    申请号:US18934756

    申请日:2024-11-01

    Abstract: Systems and methods for leveraging semantic information for a multi-domain visual agent. Semantic information can be leveraged to obtain a multi-domain visual agent. To train the multi-domain visual agent, questions can be sampled from question templates for domain-specific label spaces to obtain a unified label space. The domain-specific labels from the domain-specific label spaces can be mapped into natural language descriptions (NLD) to obtain mapped NLD. The mapped NLD can be converted into prompts by combining the questions sampled from the unified label space and the annotations. The semantic information can be learned by iteratively generating outputs from tokens extracted from the prompts using a large-language model (LLM). The multi-domain visual agent (MDVA) can be trained using the semantic information.

    AUTOMATIC DATA SYSTEMS FOR NOVEL OBJECT DETECTION

    公开(公告)号:US20250118044A1

    公开(公告)日:2025-04-10

    申请号:US18891590

    申请日:2024-09-20

    Abstract: Systems and methods for identifying novel objects in an image include detecting one or more objects in an image and generating one or more captions for the image. One or more predicted categories of the one or more objects detected in the image and the one or more captions are matched to identify, from the one or more predicted categories, a category of a novel object in the image. An image feature and a text description feature are generated using a description of the novel object. A relevant image is selected using a similarity score between the image feature and the text description feature. A model is updated using the relevant image and associated description of the novel object.

    Face recognition from unseen domains via learning of semantic features

    公开(公告)号:US11947626B2

    公开(公告)日:2024-04-02

    申请号:US17519950

    申请日:2021-11-05

    CPC classification number: G06F18/214 G06N3/04 G06V40/161

    Abstract: A method for improving face recognition from unseen domains by learning semantically meaningful representations is presented. The method includes obtaining face images with associated identities from a plurality of datasets, randomly selecting two datasets of the plurality of datasets to train a model, sampling batch face images and their corresponding labels, sampling triplet samples including one anchor face image, a sample face image from a same identity, and a sample face image from a different identity than that of the one anchor face image, performing a forward pass by using the samples of the selected two datasets, finding representations of the face images by using a backbone convolutional neural network (CNN), generating covariances from the representations of the face images and the backbone CNN, the covariances made in different spaces by using positive pairs and negative pairs, and employing the covariances to compute a cross-domain similarity loss function.

    Human detection in scenes
    95.
    发明授权

    公开(公告)号:US11610420B2

    公开(公告)日:2023-03-21

    申请号:US17128565

    申请日:2020-12-21

    Abstract: Systems and methods for human detection are provided. The system aligns image level features between a source domain and a target domain based on an adversarial learning process while training a domain discriminator. The target domain includes humans in one or more different scenes. The system selects, using the domain discriminator, unlabeled samples from the target domain that are far away from existing annotated samples from the target domain. The system selects, based on a prediction score of each of the unlabeled samples, samples with lower prediction scores. The system annotates the samples with the lower prediction scores.

    Parametric top-view representation of scenes

    公开(公告)号:US11373067B2

    公开(公告)日:2022-06-28

    申请号:US16526073

    申请日:2019-07-30

    Abstract: A method for implementing parametric models for scene representation to improve autonomous task performance includes generating an initial map of a scene based on at least one image corresponding to a perspective view of the scene, the initial map including a non-parametric top-view representation of the scene, implementing a parametric model to obtain a scene element representation based on the initial map, the scene element representation providing a description of one or more scene elements of the scene and corresponding to an estimated semantic layout of the scene, identifying one or more predicted locations of the one or more scene elements by performing three-dimensional localization based on the at least one image, and obtaining an overlay for performing an autonomous task by placing the one or more scene elements with the one or more respective predicted locations onto the scene element representation.

    FACE RECOGNITION FROM UNSEEN DOMAINS VIA LEARNING OF SEMANTIC FEATURES

    公开(公告)号:US20220147765A1

    公开(公告)日:2022-05-12

    申请号:US17519950

    申请日:2021-11-05

    Abstract: A method for improving face recognition from unseen domains by learning semantically meaningful representations is presented. The method includes obtaining face images with associated identities from a plurality of datasets, randomly selecting two datasets of the plurality of datasets to train a model, sampling batch face images and their corresponding labels, sampling triplet samples including one anchor face image, a sample face image from a same identity, and a sample face image from a different identity than that of the one anchor face image, performing a forward pass by using the samples of the selected two datasets, finding representations of the face images by using a backbone convolutional neural network (CNN), generating covariances from the representations of the face images and the backbone CNN, the covariances made in different spaces by using positive pairs and negative pairs, and employing the covariances to compute a cross-domain similarity loss function.

    Human action recognition in drone videos

    公开(公告)号:US11250573B2

    公开(公告)日:2022-02-15

    申请号:US16515713

    申请日:2019-07-18

    Abstract: A method is provided for drone-video-based action recognition. The method learns a transformation for each of target video clips taken from a set of target videos, responsive to original features extracted from the target video clips. The transformation corrects differences between a target drone domain corresponding to the target video clips and a source non-drone domain corresponding to source video clips taken from a set of source videos. The method adapts the target to the source domain by applying the transformation to the original features to obtain transformed features for the target video clips. The method converts the original and transformed features of same ones of the target video clips into a single classification feature for each of the target videos. The method classifies a human action in a new target video relative to the set of source videos using the single classification feature for each of the target videos.

Patent Agency Ranking