-
公开(公告)号:US20250148766A1
公开(公告)日:2025-05-08
申请号:US18934756
申请日:2024-11-01
Applicant: NEC Laboratories America, Inc.
Inventor: Vijay Kumar Baikampady Gopalkrishna , Masoud Faraki , Yumin Suh , Manmohan Chandraker
IPC: G06V10/774 , B60W60/00 , G06F40/284 , G06F40/30 , G06V10/86 , G06V20/56
Abstract: Systems and methods for leveraging semantic information for a multi-domain visual agent. Semantic information can be leveraged to obtain a multi-domain visual agent. To train the multi-domain visual agent, questions can be sampled from question templates for domain-specific label spaces to obtain a unified label space. The domain-specific labels from the domain-specific label spaces can be mapped into natural language descriptions (NLD) to obtain mapped NLD. The mapped NLD can be converted into prompts by combining the questions sampled from the unified label space and the annotations. The semantic information can be learned by iteratively generating outputs from tokens extracted from the prompts using a large-language model (LLM). The multi-domain visual agent (MDVA) can be trained using the semantic information.
-
公开(公告)号:US20250118044A1
公开(公告)日:2025-04-10
申请号:US18891590
申请日:2024-09-20
Applicant: NEC Laboratories America, Inc.
Inventor: Jong-Chyi Su , Samuel Schulter , Sparsh Garg , Manmohan Chandraker , Mingfu Liang
Abstract: Systems and methods for identifying novel objects in an image include detecting one or more objects in an image and generating one or more captions for the image. One or more predicted categories of the one or more objects detected in the image and the one or more captions are matched to identify, from the one or more predicted categories, a category of a novel object in the image. An image feature and a text description feature are generated using a description of the novel object. A relevant image is selected using a similarity score between the image feature and the text description feature. A model is updated using the relevant image and associated description of the novel object.
-
公开(公告)号:US20240354921A1
公开(公告)日:2024-10-24
申请号:US18616396
申请日:2024-03-26
Applicant: NEC Laboratories America, Inc.
Inventor: Sparsh Garg , Bingbing Zhuang , Samuel Schulter , Manmohan Chandraker
CPC classification number: G06T7/0002 , G06T7/10 , G06T7/50 , G06V20/588 , G06T2207/10028 , G06T2207/20081 , G06T2207/20084 , G06T2207/30256
Abstract: Systems and methods for road defect level prediction. A depth map is obtained from an image dataset received from input peripherals by employing a vision transformer model. A plurality of semantic maps is obtained from the image dataset by employing a semantic segmentation model to give pixel-wise segmentation results of road scenes to detect road pixels. Regions of interest (ROI) are detected by utilizing the road pixels. Road defect levels are predicted by fitting the ROI and the depth map into a road surface model to generate road points classified into road defect levels. The predicted road defect levels are visualized on a road map.
-
公开(公告)号:US11947626B2
公开(公告)日:2024-04-02
申请号:US17519950
申请日:2021-11-05
Applicant: NEC Laboratories America, Inc.
Inventor: Masoud Faraki , Xiang Yu , Yi-Hsuan Tsai , Yumin Suh , Manmohan Chandraker
IPC: G06F18/214 , G06N3/04 , G06V40/16
CPC classification number: G06F18/214 , G06N3/04 , G06V40/161
Abstract: A method for improving face recognition from unseen domains by learning semantically meaningful representations is presented. The method includes obtaining face images with associated identities from a plurality of datasets, randomly selecting two datasets of the plurality of datasets to train a model, sampling batch face images and their corresponding labels, sampling triplet samples including one anchor face image, a sample face image from a same identity, and a sample face image from a different identity than that of the one anchor face image, performing a forward pass by using the samples of the selected two datasets, finding representations of the face images by using a backbone convolutional neural network (CNN), generating covariances from the representations of the face images and the backbone CNN, the covariances made in different spaces by using positive pairs and negative pairs, and employing the covariances to compute a cross-domain similarity loss function.
-
公开(公告)号:US11610420B2
公开(公告)日:2023-03-21
申请号:US17128565
申请日:2020-12-21
Applicant: NEC Laboratories America, Inc.
Inventor: Yi-Hsuan Tsai , Kihyuk Sohn , Buyu Liu , Manmohan Chandraker , Jong-Chyi Su
Abstract: Systems and methods for human detection are provided. The system aligns image level features between a source domain and a target domain based on an adversarial learning process while training a domain discriminator. The target domain includes humans in one or more different scenes. The system selects, using the domain discriminator, unlabeled samples from the target domain that are far away from existing annotated samples from the target domain. The system selects, based on a prediction score of each of the unlabeled samples, samples with lower prediction scores. The system annotates the samples with the lower prediction scores.
-
公开(公告)号:US11468585B2
公开(公告)日:2022-10-11
申请号:US16987705
申请日:2020-08-07
Applicant: NEC Laboratories America, Inc.
Inventor: Quoc-Huy Tran , Pan Ji , Manmohan Chandraker , Lokender Tiwari
Abstract: A method for improving geometry-based monocular structure from motion (SfM) by exploiting depth maps predicted by convolutional neural networks (CNNs) is presented. The method includes capturing a sequence of RGB images from an unlabeled monocular video stream obtained by a monocular camera, feeding the RGB images into a depth estimation/refinement module, outputting depth maps, feeding the depth maps and the RGB images to a pose estimation/refinement module, the depths maps and the RGB images collectively defining pseudo RGB-D images, outputting camera poses and point clouds, and constructing a 3D map of a surrounding environment displayed on a visualization device.
-
公开(公告)号:US11373067B2
公开(公告)日:2022-06-28
申请号:US16526073
申请日:2019-07-30
Applicant: NEC Laboratories America, Inc.
Inventor: Samuel Schulter , Ziyan Wang , Buyu Liu , Manmohan Chandraker
IPC: G06K9/62 , B60R11/04 , G05D1/02 , G06N3/02 , G06V20/56 , H04N5/32 , B60W50/14 , B60W60/00 , G06N3/04 , G06N3/08 , G06V10/82 , H04N5/232
Abstract: A method for implementing parametric models for scene representation to improve autonomous task performance includes generating an initial map of a scene based on at least one image corresponding to a perspective view of the scene, the initial map including a non-parametric top-view representation of the scene, implementing a parametric model to obtain a scene element representation based on the initial map, the scene element representation providing a description of one or more scene elements of the scene and corresponding to an estimated semantic layout of the scene, identifying one or more predicted locations of the one or more scene elements by performing three-dimensional localization based on the at least one image, and obtaining an overlay for performing an autonomous task by placing the one or more scene elements with the one or more respective predicted locations onto the scene element representation.
-
公开(公告)号:US20220147765A1
公开(公告)日:2022-05-12
申请号:US17519950
申请日:2021-11-05
Applicant: NEC Laboratories America, Inc.
Inventor: Masoud Faraki , Xiang Yu , Yi-Hsuan Tsai , Yumin Suh , Manmohan Chandraker
Abstract: A method for improving face recognition from unseen domains by learning semantically meaningful representations is presented. The method includes obtaining face images with associated identities from a plurality of datasets, randomly selecting two datasets of the plurality of datasets to train a model, sampling batch face images and their corresponding labels, sampling triplet samples including one anchor face image, a sample face image from a same identity, and a sample face image from a different identity than that of the one anchor face image, performing a forward pass by using the samples of the selected two datasets, finding representations of the face images by using a backbone convolutional neural network (CNN), generating covariances from the representations of the face images and the backbone CNN, the covariances made in different spaces by using positive pairs and negative pairs, and employing the covariances to compute a cross-domain similarity loss function.
-
公开(公告)号:US20220111869A1
公开(公告)日:2022-04-14
申请号:US17494927
申请日:2021-10-06
Applicant: NEC Laboratories America, Inc.
Inventor: Buyu Liu , Pan Ji , Bingbing Zhuang , Manmohan Chandraker , Uday Kusupati
IPC: B60W60/00 , G06T7/50 , G06K9/72 , G06T7/10 , G06K9/00 , G06K9/62 , G06T7/70 , G06N3/04 , G06N3/08
Abstract: Methods and systems for determining a path include detecting objects within a perspective image that shows a scene. Depth is predicted within the perspective image. Semantic segmentation is performed on the perspective image. An attention map is generated using the detected objects and the predicted depth. A refined top-down view of the scene is generated using the predicted depth and the semantic segmentation. A parametric top-down representation of the scene is determined using a relational graph model. A path through the scene is determined using the parametric top-down representation.
-
公开(公告)号:US11250573B2
公开(公告)日:2022-02-15
申请号:US16515713
申请日:2019-07-18
Applicant: NEC Laboratories America, Inc.
Inventor: Gaurav Sharma , Manmohan Chandraker , Jinwoo Choi
Abstract: A method is provided for drone-video-based action recognition. The method learns a transformation for each of target video clips taken from a set of target videos, responsive to original features extracted from the target video clips. The transformation corrects differences between a target drone domain corresponding to the target video clips and a source non-drone domain corresponding to source video clips taken from a set of source videos. The method adapts the target to the source domain by applying the transformation to the original features to obtain transformed features for the target video clips. The method converts the original and transformed features of same ones of the target video clips into a single classification feature for each of the target videos. The method classifies a human action in a new target video relative to the set of source videos using the single classification feature for each of the target videos.
-
-
-
-
-
-
-
-
-