-
公开(公告)号:US11996083B2
公开(公告)日:2024-05-28
申请号:US17337518
申请日:2021-06-03
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Kaizhi Qian , Yang Zhang , Shiyu Chang , Jinjun Xiong , Chuang Gan , David Cox
IPC: G10L13/10 , G06N20/00 , G10L17/04 , G10L21/013 , G10L25/63
CPC classification number: G10L13/10 , G06N20/00 , G10L17/04 , G10L21/013 , G10L25/63
Abstract: A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.
-
公开(公告)号:US11989068B2
公开(公告)日:2024-05-21
申请号:US17852699
申请日:2022-06-29
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Xin Zhang , Shun Zhang , Shaoze Fan , Xiaoxiao Guo , Chuang Gan
IPC: G06F1/26 , G06F1/20 , G06F1/28 , G06F1/3203 , G06N20/00
CPC classification number: G06F1/26 , G06F1/206 , G06F1/28 , G06F1/3203 , G06N20/00
Abstract: Described aspects include a system for optimizing performance of a functional circuit unit, a method of optimizing performance of a functional circuit unit, and a computer program product. In one embodiment, the system may include a functional circuit unit having an associated cooling device and power converter, one or more sensors for the functional circuit unit, the one or more sensors including a power sensor and a temperature sensor, and a first machine learning model. The first machine learning model may be adapted to receive temperature data and power data from the one or more sensors, and to generate control signals for the cooling device and the power converter to optimize performance of the functional circuit unit.
-
公开(公告)号:US20240127001A1
公开(公告)日:2024-04-18
申请号:US17964633
申请日:2022-10-12
Applicant: International Business Machines Corporation
Inventor: Kaizhi Qian , Yang Zhang , Chuang Gan , Bo Wu , Zhenfang Chen
Abstract: Techniques for audio understanding using fixed language models are provided. In one aspect, a system for performing audio understanding tasks includes: a fixed text embedder for, on receipt of a prompt sequence having (e.g., from 0-10) demonstrations of an audio understanding task followed by a new question, converting the prompt sequence into text embeddings; a pretrained audio encoder for converting the prompt sequence into audio embeddings; and a fixed autoregressive language model for answering the new question using the text embeddings and the audio embeddings. A method for performing audio understanding tasks is also provided.
-
公开(公告)号:US20240095435A1
公开(公告)日:2024-03-21
申请号:US17932538
申请日:2022-09-15
Inventor: Shun Zhang , Xin Zhang , Shaoze Fan , Ningyuan Cao , Jing Li , Xiaoxiao Guo , Chuang Gan
IPC: G06F30/398
CPC classification number: G06F30/398 , G06F2119/02
Abstract: A method, system, and computer program product for circuit design automation. The method identifies a set of circuit components for a proposed circuit design. A subset of circuit components is selected to generate an initial topology for the proposed circuit design. A set of subsequent topologies are iteratively generated by a heuristic search algorithm based on the subset of circuit components and the initial topology. A set of valid topologies of the set of subsequent topologies are determined by a circuit simulator based on the subset of circuit components and a set of connections within the set of subsequent topologies. The method generates the proposed circuit design from the set of valid topologies.
-
公开(公告)号:US11854305B2
公开(公告)日:2023-12-26
申请号:US17315319
申请日:2021-05-09
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Dakuo Wang , Kaizhi Qian
IPC: G06V40/20 , G06T7/246 , G06F3/01 , G06V20/40 , G06F18/2133
CPC classification number: G06V40/23 , G06F3/011 , G06F18/2133 , G06T7/246 , G06V20/46 , G06T2207/10016 , G06T2207/20076 , G06T2207/20084 , G06T2207/30196 , G06V2201/00
Abstract: A bi-directional spatial-temporal transformer neural network (BDSTT) is trained to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames. Obtain a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints. Produce a spatially masked frame by masking the original coordinates of the skeletal joint. Provide the specific frame, the spatially masked frame, and at least one more frame to a coordinate prediction head of the BDSTT. Obtain, from the coordinate prediction head, a prediction of coordinates for the skeletal joint. Adjust parameters of the BDSTT until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.
-
公开(公告)号:US20230368529A1
公开(公告)日:2023-11-16
申请号:US17662663
申请日:2022-05-10
Applicant: International Business Machines Corporation
Inventor: Bo Wu , Chuang Gan , Pin-Yu Chen , Zhenfang Chen , Dakuo Wang
CPC classification number: G06V20/41 , G06V20/46 , G06V10/806
Abstract: One or more computer processors improve action recognition by removing inference introduced by visual appearances of objects within a received video segment. The one or more computer processors extract appearance information and structure information from a received video segment. The one or more computer processors calculate a factual inference (TE) for the received video segment utilizing the extracted appearance information and structure information. The one or more computer processors calculate a counterfactual debiasing inference (NDE) for the received video segment. The one or more computer processors calculate a total indirect effect (TIE) by subtracting the calculated counterfactual debiased inference from the calculated factual inference. The one or more computer processors action recognize the received video segment by selecting a classification result associated with a highest calculated TIE.
-
公开(公告)号:US20230368510A1
公开(公告)日:2023-11-16
申请号:US17743661
申请日:2022-05-13
Applicant: International Business Machines Corporation
Inventor: Zhenfang Chen , Chuang Gan , Bo Wu , Pin-Yu Chen
IPC: G06V10/82 , G06V10/422 , G06V30/262
CPC classification number: G06V10/82 , G06V10/422 , G06V30/262
Abstract: A system may include a memory and a processor in communication with the memory. The processor may be configured to perform operations. The operations may include receiving an input, extracting features from the input, and mining object relations using the features. The operations may include determining feature vectors using the object relations and generating, using the feature vectors, an output indicating a target region, wherein the target region corresponds to the input.
-
公开(公告)号:US11816889B2
公开(公告)日:2023-11-14
申请号:US17216605
申请日:2021-03-29
Applicant: International Business Machines Corporation
Inventor: Chuang Gan , Dakuo Wang , Antonio Jose Jimeno Yepes , Bo Wu
IPC: G06V20/40 , G06V10/40 , G06N3/04 , G06N3/088 , G06F16/735 , G06F18/24 , G06F18/214
CPC classification number: G06V20/41 , G06F16/735 , G06F18/214 , G06F18/24 , G06N3/04 , G06N3/088 , G06V10/40 , G06V20/46
Abstract: Unsupervised learning for video classification. One or more features from one or more video clips are extracted using a spatial-temporal encoder. The one or more extracted features are processed, using a video instance discrimination task, to generate a classification label, the classification label indicating whether two of the video clips are from a same video. The one or more extracted features are processed, using a pair-wise speed discrimination task, to generate a comparison label, the comparison label indicating a relative playback speed between two given video clips. A search is performed in a video database for a video that is similar to a given video based on the comparison label.
-
公开(公告)号:US11687777B2
公开(公告)日:2023-06-27
申请号:US17005144
申请日:2020-08-27
Inventor: Ao Liu , Sijia Liu , Bo Wu , Lirong Xia , Qi Cheng Li , Chuang Gan
CPC classification number: G06N3/08 , G06F16/56 , G06F18/21 , G06T3/4046 , G06T5/002 , G06T2207/20084
Abstract: Interpretation maps of convolutional neural networks having certifiable robustness using Rényi differential privacy are provided. In one aspect, a method for generating an interpretation map includes: adding generalized Gaussian noise to an image x to obtain T noisy images, wherein the generalized Gaussian noise constitutes perturbations to the image x; providing the T noisy images as input to a convolutional neural network; calculating T noisy interpretations of output from the convolutional neural network corresponding to the T noisy images; re-scaling the T noisy interpretations using a scoring vector υ to obtain T re-scaled noisy interpretations; and generating the interpretation map using the T re-scaled noisy interpretations, wherein the interpretation map is robust against the perturbations.
-
公开(公告)号:US11443069B2
公开(公告)日:2022-09-13
申请号:US16559161
申请日:2019-09-03
Applicant: International Business Machines Corporation
Inventor: Sijia Liu , Quanfu Fan , Gaoyuan Zhang , Chuang Gan
Abstract: An illustrative embodiment includes a method for protecting a machine learning model. The method includes: determining concept-level interpretability of respective units within the model; determining sensitivity of the respective units within the model to an adversarial attack; identifying units within the model which are both interpretable and sensitive to the adversarial attack; and enhancing defense against the adversarial attack by masking at least a portion of the units identified as both interpretable and sensitive to the adversarial attack.
-
-
-
-
-
-
-
-
-