-
公开(公告)号:US20220326982A1
公开(公告)日:2022-10-13
申请号:US17225427
申请日:2021-04-08
Applicant: International Business Machines Corporation
Inventor: A Peng Zhang , Lei Gao , Jin Wang , Jing James Xu , Jun Wang , Dong Hai Yu
Abstract: Mechanisms are provided for intelligently identifying an execution environment to execute a computing job. An execution time of the computing job in each execution environment of a plurality of execution environments is predicted by applying a set of existing machine learning models matching execution context information and key parameters of the computing job and execution environment information of the execution environment. The predicted execution time of the machine learning models is aggregated. The aggregated predicted execution times of the computing job are summarized for the plurality of execution environments. Responsive to a selection of an execution environment from the plurality of execution environments based on the summary of the aggregated predicted execution times of the computing job, the computing job is executed in the selected execution environment. Related data during the execution of the computing job in the selected execution environment is collected.
-
公开(公告)号:US20170147675A1
公开(公告)日:2017-05-25
申请号:US14945853
申请日:2015-11-19
Applicant: International Business Machines Corporation
Inventor: Sier Han , Zhiyuan Wang , Ji Hui Yang , A Peng Zhang , Xueying Zhang , Xiu Fang Zhu
IPC: G06F17/30
CPC classification number: G06F16/35
Abstract: Refining cluster definition: (i) receiving data items, each characterized by values respectively corresponding to a set of dimension(s); (ii) receiving initial cluster identification that divides the set of data items into multiple initial clusters; (iii) determining a distribution curve, with respect to a first dimension, of data items of a first initial cluster; (iv) determining a distribution curve, with respect to the first dimension, of data items of a second initial cluster; and (v) determining a first-dimension-first-cluster-second-cluster cut-off value such that the following two proportions are substantially equal: (a) a proportion of the area under the first distribution curve and below the first-dimension-first-cluster-second-cluster cut-off value to the total area under the first distribution curve, and (b) a proportion of the area under the second distribution curve and above the first-dimension-first-cluster-second-cluster cut-off value to the total area under the second distribution curve.
-
公开(公告)号:US20250165492A1
公开(公告)日:2025-05-22
申请号:US18513579
申请日:2023-11-19
Applicant: International Business Machines Corporation
Inventor: A Peng Zhang , Si Er Han , Lei Gao , Jin Wang
Abstract: An example operation may include one or more of storing an original data set in memory, splitting the original data set into a subset of continuous-type data values and a subset of discrete-type data values based on variable types in the original data set, converting the subset of continuous-type data values into a second subset of discrete-type data values based on a data binning operation, generating a new subset of continuous-type data values based on the subset of continuous-type data values in the original data set, and combining a subset of discrete-type data values from a conditional contingency table within the new subset of continuous-type data values to generate a new data set.
-
公开(公告)号:US12099628B2
公开(公告)日:2024-09-24
申请号:US17661780
申请日:2022-05-03
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Jin Wang , Lei Gao , A Peng Zhang , Kai Li , Jun Wang , Xiao Ming Ma , Xin Feng Zhu , Geng Wu Yang
CPC classification number: G06F21/6245 , G06F16/35 , G06F18/23
Abstract: The present disclosure relates to privacy protection in a search process. According to a method, a target emotion vector is extracted from a search interaction, the target emotion vector representing emotional information in the search interaction. Respective emotion distances between the target emotion vector and respective emotion vectors associated with a plurality of text clusters are determined. The plurality of text clusters is clustered from a dictionary of text elements. A first number of text clusters are selected from the plurality of text clusters based on the determined respective emotion distances. The first number of text clusters have emotion distances larger than at least one unselected text cluster among the plurality of text clusters. A plurality of confused search interactions are constructed for the search interaction based on the first number of text clusters, and the plurality of confused search interactions are performed.
-
公开(公告)号:US20230119654A1
公开(公告)日:2023-04-20
申请号:US17451495
申请日:2021-10-20
Applicant: International Business Machines Corporation
Inventor: Jin Wang , Lei Gao , Kai Li , A Peng Zhang , Yan Liu , Jia Xing Tang , Xin Feng Zhu
IPC: G06N20/00
Abstract: Identifying node importance in a machine learning pipeline is provided. Changes in accuracy of the machine learning pipeline are recorded for each respective node setting change in a randomly generated group of node settings inputted into each corresponding node included in the machine learning pipeline. A regression model is generated to determine a relationship between each respective node setting change in the randomly generated group of node settings inputted into each corresponding node and the changes in the accuracy of the machine learning pipeline. A node of importance is identified in the machine learning pipeline using the regression model based on the relationship between each respective node setting change in the randomly generated group of node settings inputted into each corresponding node and the changes in the accuracy of the machine learning pipeline.
-
公开(公告)号:US11520757B2
公开(公告)日:2022-12-06
申请号:US17019383
申请日:2020-09-14
Applicant: International Business Machines Corporation
Inventor: Jing James Xu , Jing Xu , Xiao Ming Ma , Jian Jun Wang , Jun Wang , A Peng Zhang , Xing Wei
IPC: G06F16/00 , G06F16/215 , G06F16/21 , G06N5/04 , G06F16/2457 , G16H10/60
Abstract: Embodiments relate to a system, computer program product, and method for determining missing values in respective data records with an explanatory analysis to provide a context of the determined values. Such method includes receiving a dataset including incomplete data records that are missing predictors and complete data records. A model is trained with the complete data records and candidate predictors for the missing predictors are generated. A predictor importance value is generated for each candidate predictor and the candidate predictors that have a predictor importance value in excess of a first threshold value are promoted. Respective promoted candidate predictors are inserted into the respective incomplete data records, thereby creating tentative data records. The tentative data records are injected into the model, a fit value is determined for each of the tentative data records, and a tentative data record with a fit value exceeding a second threshold value is selected.
-
公开(公告)号:US20220327418A1
公开(公告)日:2022-10-13
申请号:US17225800
申请日:2021-04-08
Applicant: International Business Machines Corporation
Inventor: Dong Hai Yu , Jun Wang , Si Er Han , Xiao Ming Ma , Lei Gao , A Peng Zhang
Abstract: Feature importance is critical to understanding how predictive models produce accurate results, and can change significantly for different models. The present invention is used to achieve a good ranking for stable feature importance. An optimized technique is presented which considers feature importance value variation within different groups of cross-trained models. Feature importance is computed for all group models with this optimized method, and then a best set of models can be selected based on classification error as well as optimized stable feature importance values.
-
公开(公告)号:US11288173B1
公开(公告)日:2022-03-29
申请号:US17027780
申请日:2020-09-22
Applicant: International Business Machines Corporation
Inventor: Jin Wang , Lei Gao , A Peng Zhang , Si Er Han , Jing James Xu , Kai Li
Abstract: Test case selection methods are disclosed. A feature of a candidate test case and respective features of a set of test cases are extracted. The set of test cases is clustered into a plurality of clusters based on the respective features of the set of test cases. At least one cluster related to the candidate test case is determined from the plurality of clusters based on the feature of the candidate test case. At least one test case similar to the candidate test case is selected from a plurality of test cases included in the at least one cluster.
-
公开(公告)号:US11150630B2
公开(公告)日:2021-10-19
申请号:US15787732
申请日:2017-10-19
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Lei Fan , Sier Han , Xiao Ming Ma , A Peng Zhang
IPC: G05B19/4065 , G05B23/02
Abstract: Statistically significant event patterns predict the timing for performing entity maintenance. Event patterns are determined based on a target variable having an undesired value for a given entity when the event pattern occurs. Event patterns are filtered based on distributions of the event patterns across multiple entities and distributions of event patterns during desired operation of the entities and undesired operation of the entities. A predictive maintenance process is established having significant event patterns as the basis for maintenance tasks.
-
公开(公告)号:US12225011B2
公开(公告)日:2025-02-11
申请号:US17809563
申请日:2022-06-29
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Jin Wang , Lei Gao , A Peng Zhang , Dan Sun , Jing Zhang , Na Liu , Xun Pan , Zi Yun Kang
Abstract: Computer technology for protecting data security in a computerized system for recommending content to users where, a processing unit generates an identifier for a first data record relating to a user device based on a first machine learning model. Then, the processing unit sends the identifier to a service provider, and the service provider uses the identifier to determine one or more contents to be sent to the user device. Creating and using a decision tree machine learning (ML) model and a cluster ML model with training records and a transformed records.
-
-
-
-
-
-
-
-
-