-
公开(公告)号:US12051243B2
公开(公告)日:2024-07-30
申请号:US17516119
申请日:2021-11-01
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Zhenfang Chen , Dakuo Wang
IPC: G06F40/284 , G06F18/21 , G06F40/205 , G06N3/044 , G06N3/045 , G06V20/40
CPC classification number: G06V20/41 , G06F18/21 , G06F40/205 , G06F40/284 , G06N3/044 , G06N3/045 , G06V20/46
Abstract: A processor may receive a video including a plurality of video frames in sequence and a question regarding the video. For a video frame in the plurality of video frames, a processor may parse the video frame into objects and relationships between the objects, and create a subgraph of nodes representing objects and edges representing the relationships, where parsing and creating are performed for each video frame in the plurality of video frames, where a plurality of subgraphs can be created. A processor may create a hypergraph connecting subgraphs by learning relationships between the nodes of the subgraphs, where a hyper-edge is created to represent a relationship between at least one node of one subgraph and at least one node of another subgraph in the plurality of subgraphs. A processor may generate an answer to the question based on the hypergraph.
-
公开(公告)号:US11741722B2
公开(公告)日:2023-08-29
申请号:US17012463
申请日:2020-09-04
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Yang Zhang , Dakuo Wang
IPC: G06V20/58 , G06V30/194
CPC classification number: G06V20/584 , G06V30/194
Abstract: A vehicle light signal detection and recognition method, system, and computer program product include bounding, using a coarse attention module, one or more regions of an image of an automobile including at least one of a brake light and a signal light generated by automobile signals which include illuminated sections to generate one or more bounded region, removing, using a fine attention module, noise from the one or more bounded regions to generate one or more noise-free bounded regions, and identifying the at least one of the brake light and the signal light from the one or more noise-free bounded regions.
-
公开(公告)号:US11736423B2
公开(公告)日:2023-08-22
申请号:US17307175
申请日:2021-05-04
Applicant: International Business Machines Corporation
Inventor: Dakuo Wang , Mo Yu , Chuang Gan , Bo Wu
CPC classification number: H04L51/04 , G06F11/302 , G06F18/2178 , G06F40/30
Abstract: Systems, computer-implemented methods, and/or computer program products facilitating a process to identify and respond to a primary electronic message are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can include a determination component can determine that a primary electronic message has not received a response electronic message. An analysis component can generate a generated electronic message addressing the informational or emotional content of the primary electronic message. In one or more embodiments, an updating component can update the analytical model based on one or more feedbacks to the generated electronic message, where the analytical model can remain active while being updated. The one or more feedbacks can comprise a feedback from an entity-in-the-loop monitoring outputs of the analytical model including the generated electronic message.
-
公开(公告)号:US20230027713A1
公开(公告)日:2023-01-26
申请号:US17381408
申请日:2021-07-21
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Dakuo Wang , Zhenfang Chen
IPC: G06N5/04 , G06K9/00 , G06F40/205 , G06F40/284 , G06N5/02 , G06N20/20
Abstract: Mechanisms are provided for performing artificial intelligence-based video question answering. A video parser parses an input video data sequence to generate situation data structure(s), each situation data structure comprising data elements corresponding to entities, and first relationships between entities, identified by the video parser as present in images of the input video data sequence. First machine learning computer model(s) operate on the situation data structure(s) to predict second relationship(s) between the situation data structure(s). Second machine learning computer model(s) execute on a received input question to predict an executable program to execute to answer the received question. The program is executed on the situation data structure(s) and predicted second relationship(s). An answer to the question is output based on results of executing the program.
-
公开(公告)号:US20230401435A1
公开(公告)日:2023-12-14
申请号:US17838722
申请日:2022-06-13
Inventor: Pin-Yu Chen , Tejaswini Pedapati , Bo Wu , Chuang Gan , Chunheng Jiang , Jianxi Gao
CPC classification number: G06N3/0635 , G06N3/08 , G01R27/2605
Abstract: An output layer is removed from a pre-trained neural network model and a neural capacitance probe unit with multiple layers is incorporated on top of one or more bottom layers of the pre-trained neural network model. The neural capacitance probe unit is randomly initialized and a modified neural network model is trained by fine-tuning the one or more bottom layers on a target dataset for a maximum number of epochs, the modified neural network model comprising the neural capacitance probe unit incorporated with multiple layers on top of the one or more bottom layers of the pre-trained neural network model. An adjacency matrix is obtained from the initialized neural capacitance probe unit and a neural capacitance metric is computed using the adjacency matrix. An active model is selected using the neural capacitance metric and a machine learning system is configured using the active model.
-
公开(公告)号:US20230360364A1
公开(公告)日:2023-11-09
申请号:US17737535
申请日:2022-05-05
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Pin-Yu Chen , Xin Zhang
IPC: G06V10/764 , G06V10/774 , G06V10/80
CPC classification number: G06V10/764 , G06V10/7753 , G06V10/806
Abstract: Mechanisms are provided for performing machine learning (ML) training of a ML action recognition computer model which involves processing an original input dataset to generate an object feature bank comprising object feature data structures for a plurality of different objects. For an input video, a verb data structure and an original object data structure are generated and a candidate object feature data structure is selected from the object feature bank for generation of pseudo composition (PC) training data. The PC training data is generated based on the selected candidate object feature data structure and comprises a combination of the verb data structure and the candidate object feature data structure. The PC training data represents a combination of an action and an object not represented in the original input dataset. ML training of the ML action recognition computer model is performed based on an unseen combination comprising the PC training data.
-
公开(公告)号:US20240412074A1
公开(公告)日:2024-12-12
申请号:US18331211
申请日:2023-06-08
Inventor: Pin-Yu Chen , I-Hsin Chung , Bo Wu , Chuang Gan , Lei Hsiung , Yun-Yun Tsai , Tsung-Yi Ho
IPC: G06N3/094
Abstract: Some embodiments of the present disclosure are directed to systems, computer-readable media, and computer-implemented methods for neural network training. Some embodiments are directed to determining an attack order schedule for the data sample that includes a plurality of adversarial perturbation attacks associated with the data sample, and performing a composite adversarial attack process against the data set using the determined attack order schedule to generate a perturbed data sample for the data sample. Other embodiments may be disclosed or claimed.
-
公开(公告)号:US20240404106A1
公开(公告)日:2024-12-05
申请号:US18327608
申请日:2023-06-01
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Bo Wu , Chuang Gan , YADA ZHU , Pin-Yu Chen
Abstract: Provided are a computer program product, system, and method for training a pose estimation model to determine anatomy keypoints in images. A teacher network, implementing machine learning, processes images representing anatomies to produce heatmaps representing keypoints of the anatomies. An anatomy parsing network, implementing machine learning, processes the images to produce segmentation representations labeling anatomies represented in the images. The segmentation representations from the anatomy parsing network and the heatmaps from the teacher network are concatenated to produce mixed heatmaps. A pose estimation model, implementing machine learning, is trained to process the images to output predicted heatmaps to minimize a loss function of the output predicted heatmaps from the pose estimation model and the mixed heatmaps.
-
公开(公告)号:US12087064B2
公开(公告)日:2024-09-10
申请号:US18230775
申请日:2023-08-07
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Bo Wu , Chuang Gan , Yang Zhang , Dakuo Wang
IPC: G06V20/58 , G06V30/194
CPC classification number: G06V20/584 , G06V30/194
Abstract: A vehicle light signal detection and recognition method, system, and computer program product include bounding, using a coarse attention module, one or more regions of an image of an automobile including at least one of a brake light and a signal light generated by automobile signals which include illuminated sections to generate one or more bounded region, removing, using a fine attention module, noise from the one or more bounded regions to generate one or more noise-free bounded regions, and identifying the at least one of the brake light and the signal light from the one or more noise-free bounded regions.
-
公开(公告)号:US20240127001A1
公开(公告)日:2024-04-18
申请号:US17964633
申请日:2022-10-12
Applicant: International Business Machines Corporation
Inventor: Kaizhi Qian , Yang Zhang , Chuang Gan , Bo Wu , Zhenfang Chen
Abstract: Techniques for audio understanding using fixed language models are provided. In one aspect, a system for performing audio understanding tasks includes: a fixed text embedder for, on receipt of a prompt sequence having (e.g., from 0-10) demonstrations of an audio understanding task followed by a new question, converting the prompt sequence into text embeddings; a pretrained audio encoder for converting the prompt sequence into audio embeddings; and a fixed autoregressive language model for answering the new question using the text embeddings and the audio embeddings. A method for performing audio understanding tasks is also provided.
-
-
-
-
-
-
-
-
-