-
公开(公告)号:US20240403629A1
公开(公告)日:2024-12-05
申请号:US18327963
申请日:2023-06-02
Inventor: Pin-Yu Chen , I-Hsin Chung , Bo Wu , Chuang Gan , Tsung-Yi Ho , Yung-Chen Tang
IPC: G06N3/08
Abstract: Some embodiments of the present disclosure are directed to systems, computer-readable media, and computer-implemented methods for neural network calibration. Some embodiments are directed to determining a universal perturbation value and temperature scaling parameter based on a training data set, and processing a testing data set using a neural network by applying the universal perturbation value to the testing data set, and applying the temperature scaling parameter to a plurality of logits determined by the neural network based on the testing data set. Other embodiments may be disclosed or claimed.
-
公开(公告)号:US20240394508A1
公开(公告)日:2024-11-28
申请号:US18323364
申请日:2023-05-24
Inventor: Shun Zhang , Xin Zhang , Shaoze Fan , Jing Li , Ningyuan Cao , Xiaoxiao Guo , Chuang Gan
IPC: G06N3/0442 , G06N3/084
Abstract: A computing device includes a processor and a storage device coupled to the processor. The storage device stores instructions to cause the processor to perform acts to provide a circuit performance modeling. The acts include identifying and extracting paths of an electric circuit between a plurality of designated components that represent the electric circuit; converting at least one of the extracted paths to a path embedding comprising a vector of a fixed length; and predicting, by a circuit representation-learning model, characteristics of the designated components that represent the electric circuit based on an input of circuit parameters of the electric circuit.
-
公开(公告)号:US20240256894A1
公开(公告)日:2024-08-01
申请号:US18162894
申请日:2023-02-01
Inventor: Pin-Yu Chen , Bo Wu , Zhenfang Chen , Chuang Gan , Huzaifa Arif
IPC: G06N3/098
CPC classification number: G06N3/098
Abstract: Systems and techniques that facilitate reprogrammable federated learning are provided. In various embodiments, a server device can share a pre-trained and frozen neural network with a set of client devices. In various aspects, the server device can orchestrate reprogrammable federated learning of the pre-trained and frozen neural network among the set of client devices. In various instances, the pre-trained and frozen neural network can be positioned between at least one trainable input layer and at least one trainable output layer, and the reprogrammable federated learning can involve the at least one trainable input layer and the at least one trainable output layer, but not the pre-trained and frozen neural network, being locally adjusted by the set of client devices.
-
公开(公告)号:US12020480B2
公开(公告)日:2024-06-25
申请号:US17662663
申请日:2022-05-10
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Pin-Yu Chen , Zhenfang Chen , Dakuo Wang
CPC classification number: G06V20/41 , G06V10/806 , G06V20/46
Abstract: One or more computer processors improve action recognition by removing inference introduced by visual appearances of objects within a received video segment. The one or more computer processors extract appearance information and structure information from a received video segment. The one or more computer processors calculate a factual inference (TE) for the received video segment utilizing the extracted appearance information and structure information. The one or more computer processors calculate a counterfactual debiasing inference (NDE) for the received video segment. The one or more computer processors calculate a total indirect effect (TIE) by subtracting the calculated counterfactual debiased inference from the calculated factual inference. The one or more computer processors action recognize the received video segment by selecting a classification result associated with a highest calculated TIE.
-
公开(公告)号:US12001950B2
公开(公告)日:2024-06-04
申请号:US16299828
申请日:2019-03-12
Applicant: International Business Machines Corporation
Inventor: Yang Zhang , Chuang Gan
IPC: G06N3/08 , G06N3/088 , G10L21/0208
CPC classification number: G06N3/08 , G06N3/088 , G10L21/0208
Abstract: Mechanisms are provided for implementing a generative adversarial network (GAN) based restoration system. A first neural network of a generator of the GAN based restoration system is trained to generate an artificial audio spectrogram having a target damage characteristic based on an input audio spectrogram and a target damage vector. An original audio recording spectrogram is input to the trained generator, where the original audio recording spectrogram corresponds to an original audio recording and an input target damage vector. The trained generator processes the original audio recording spectrogram to generate an artificial audio recording spectrogram having a level of damage corresponding to the input target damage vector. A spectrogram inversion module converts the artificial audio recording spectrogram to an artificial audio recording waveform output.
-
公开(公告)号:US20240037940A1
公开(公告)日:2024-02-01
申请号:US17875566
申请日:2022-07-28
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Pin-Yu Chen , Yang Zhang , Xin Zhang
IPC: G06V20/40
Abstract: A computer vision temporal action localization (TAL) computing tool and operations are provided. The TAL computing tool receives a coarse temporal bounding box, having a first start point and a first end point, for an action in the input video data, and a first set of logits, where each logit corresponds to a potential classification of the action in the input video data. The TAL computing tool executes a first engine on the coarse temporal bounding box to generate a second set of logits, and a second engine on the first set of logits to generate a refined temporal bounding box having a second start point and a second end point. The TAL computing tool performs the computer vision temporal action localization operation based on the second set of logits and the refined temporal bounding box to specify a temporal segment of the input video data corresponding to an action represented in the input video data, and a corresponding classification of the action represented in the temporal segment.
-
公开(公告)号:US20230360642A1
公开(公告)日:2023-11-09
申请号:US17662435
申请日:2022-05-09
Applicant: International Business Machines Corporation
Inventor: Cheng-I Lai , Yang Zhang , Kaizhi Qian , Chuang Gan , James R. Glass , Alexander Haojan Liu
Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.
-
公开(公告)号:US11790181B2
公开(公告)日:2023-10-17
申请号:US16997494
申请日:2020-08-19
Applicant: International Business Machines Corporation
Inventor: Xiaoxiao Guo , Mo Yu , Yupeng Gao , Chuang Gan , Shiyu Chang , Murray Scott Campbell
IPC: G06F40/35 , G06N3/08 , G06F40/295 , G06F40/253 , G06F40/284
CPC classification number: G06F40/35 , G06F40/253 , G06F40/284 , G06F40/295 , G06N3/08
Abstract: A current observation expressed in natural language is received. Entities in the current observation are extracted. A relevant historical observation is retrieved, which has at least one of the entities in common with the current observation. The current observation and the relevant historical observation are combined as observations. The observations and a template list specifying a list of verb phrases to be filled-in with at least some of the entities are input to a neural network, which can output the template list of the verb phrases filled-in with said at least some of the entities. The neural network can include attention mechanism. A reward associated with the neural network's output can be received and fed back to the neural network for retraining the neural network.
-
公开(公告)号:US11663823B2
公开(公告)日:2023-05-30
申请号:US16989387
申请日:2020-08-10
Applicant: International Business Machines Corporation
Inventor: Chuang Gan , Dakuo Wang , Yang Zhang , Bo Wu , Xiaoxiao Guo
Abstract: Dual-modality relation networks for audio-visual event localization can be provided. A video feed for audio-visual event localization can be received. Based on a combination of extracted audio features and video features of the video feed, informative features and regions in the video feed can be determined by running a first neural network. Based on the informative features and regions in the video feed determined by the first neural network, relation-aware video features can be determined by running a second neural network. Based on the informative features and regions in the video feed, relation-aware audio features can be determined by running a third neural network. A dual-modality representation can be obtained based on the relation-aware video features and the relation-aware audio features by running a fourth neural network. The dual-modality representation can be input to a classifier to identity an audio-visual event in the video feed.
-
公开(公告)号:US20230136515A1
公开(公告)日:2023-05-04
申请号:US17516119
申请日:2021-11-01
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Zhenfang Chen , Dakuo Wang
IPC: G06K9/00 , G06K9/62 , G06F40/284 , G06F40/205 , G06N3/04
Abstract: A processor may receive a video including a plurality of video frames in sequence and a question regarding the video. For a video frame in the plurality of video frames, a processor may parse the video frame into objects and relationships between the objects, and create a subgraph of nodes representing objects and edges representing the relationships, where parsing and creating are performed for each video frame in the plurality of video frames, where a plurality of subgraphs can be created. A processor may create a hypergraph connecting subgraphs by learning relationships between the nodes of the subgraphs, where a hyper-edge is created to represent a relationship between at least one node of one subgraph and at least one node of another subgraph in the plurality of subgraphs. A processor may generate an answer to the question based on the hypergraph.
-
-
-
-
-
-
-
-
-