Patent search ap:("NEC Laboratories America Page Inc.") AND inv:"Zaid Khan"

1.

发明公开
VISUAL QUESTION ANSWERING WITH UNLABELED IMAGE AUGMENTATION 审中-公开

公开(公告)号：US20240152767A1

公开(公告)日：2024-05-09

申请号：US18497079

申请日：2023-10-30

Applicant: NEC Laboratories America, Inc.

Inventor： Vijay Kumar Baikampady Gopalkrishna , Samuel Schulter , Xiang Yu , Zaid Khan , Manmohan Chandraker

IPC: G06N3/096 , G06N3/006 , G06N3/047

CPC classification number: G06N3/096 , G06N3/006 , G06N3/047

Abstract: Systems and methods for training a visual question answer model include training a teacher model by performing image conditional visual question generation on a visual language model (VLM) and a targeted visual question answer dataset using images to generate question and answer pairs. Unlabeled images are pseudolabeled using the teacher model to decode synthetic question and answer pairs for the unlabeled images. The synthetic question and answer pairs for the unlabeled images are merged with real data from the targeted visual question answer dataset to generate a self-augmented training set. A student model is trained using the VLM and the self-augmented training set to return visual answers to text queries.

2.

发明申请
SELF-IMPROVING MODELS FOR AGENTIC VISUAL PROGRAM SYNTHESIS 有权

公开(公告)号：US20250139527A1

公开(公告)日：2025-05-01

申请号：US18930402

申请日：2024-10-29

Applicant: NEC Laboratories America, Inc.

Inventor： Vijay Kumar Baikampady Gopalkrishna , Samuel Schulter , Manmohan Chandraker , Zaid Khan

IPC: G06N20/00

Abstract: Systems and methods for a self-improving model for agentic visual program synthesis. An agent can be continuously trained using an optimal training tuple to perform a corrective action to a monitored entity which in turn generates new input data for the training. To train the agent, an input question can be decomposed into vision model tasks to generate task outputs. The task outputs can be corrected based on feedback to obtain corrected task outputs. The optimal training tuple can be generated by comparing an optimal tuple threshold with a similarity score of the input image, the input question, and the corrected task outputs.

Patent Agency Ranking