-
公开(公告)号:US10831535B2
公开(公告)日:2020-11-10
申请号:US16237747
申请日:2019-01-01
Applicant: International Business Machines Corporation
Inventor: Jingwen Leng , Alper Buyuktosunoglu , Pradip Bose , Ramon Bertran Monfort
Abstract: Preferred embodiments of systems and methods are disclosed to reduce a minimal working voltage, Vmin, and/or increase the frequency of Vmin while executing multithreaded computer programs with better reliability, efficiency, and performance. A computer complier complies multiple copies of high-level code, each with different a different set of resource allocators so system resources are allocated during simultaneous execution of multiple threads in a way that allows reducing Vmin at a given reference voltage frequency and/or increasing the frequency of Vmin at a given Vmin value.
-
12.
公开(公告)号:US20200241954A1
公开(公告)日:2020-07-30
申请号:US16262832
申请日:2019-01-30
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Swagath Venkataramani , Schuyler Eldridge , Karthik V. Swaminathan , Alper Buyuktosunoglu , Pradip Bose
Abstract: A coarse error correction system for detecting, predicting, and correcting errors in neural networks is provided. The coarse error correction system receives a first set of statistics that are computed from values collected from a neural network during a training phase of the neural network. The coarse error correction system computes a second set of statistics based on values collected from the neural network during a run-time phase of the neural network. The coarse error correction system detects an error in the neural network during the run-time phase of the neural network by comparing the first set of statistics with the second set of statistics. The coarse error correction system increases a voltage setting to the neural network based on the detected error.
-
公开(公告)号:US20200210229A1
公开(公告)日:2020-07-02
申请号:US16237747
申请日:2019-01-01
Applicant: International Business Machines Corporation
Inventor: Jingwen Leng , Alper Buyuktosunoglu , Pradip Bose , Ramon Bertran Monfort
Abstract: Preferred embodiments of systems and methods are disclosed to reduce a minimal working voltage, Vmin, and/or increase the frequency of Vmin while executing multithreaded computer programs with better reliability, efficiency, and performance. A computer complier complies multiple copies of high-level code, each with different a different set of resource allocators so system resources are allocated during simultaneous execution of multiple threads in a way that allows reducing Vmin at a given reference voltage frequency and/or increasing the frequency of Vmin at a given Vmin value.
-
公开(公告)号:US20200159691A1
公开(公告)日:2020-05-21
申请号:US16194247
申请日:2018-11-16
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Karthik V. Swaminathan , Ramon Bertran Monfort , Alper Buyuktosunoglu , Pradip Bose , Nandhini Chandramoorthy , Chen-Yong Cher
Abstract: A system and method for determining reliability-aware runtime optimal processor configuration can integrate soft and hard error data into a single metric, referred to as the balanced reliability metric (BRM), by using statistical dimensionality reduction techniques. The BRM can be used to not only adjust processor voltage to optimize overall reliability but also to adjust the number of on-cores to further optimize overall processor reliability. In some implementations, both coarse-grained actuations, based on optimal core count, and fine-grained actuations, based on optimal processor voltage (Vdd), may be used, where feedback control can recursively re-compute soft and hard error data based on a new configuration, until convergence at an optimal configuration.
-
公开(公告)号:US20190204884A1
公开(公告)日:2019-07-04
申请号:US16294095
申请日:2019-03-06
Applicant: International Business Machines Corporation
Inventor: Pradip Bose , Alper Buyuktosunoglu , Timothy Joseph Chainer , Pritish Ranjan Parida , Augusto Javier Vega
IPC: G06F1/20 , G06F1/324 , G06F1/3206
CPC classification number: G06F1/206 , G06F1/3206 , G06F1/324 , G06F2200/201 , Y02D10/126 , Y02D10/16
Abstract: Techniques for inducing heterogeneous microprocessor behavior using non-uniform cooling are described. According to an embodiment, a device is provided that comprises an IC chip comprising a plurality of cores and a cooling apparatus coupled to the integrated chip that cools the integrated chip in association with electrical operation of the plurality of cores. The cooling apparatus cools a first core of the plurality of cores to a lower temperature than a second core of the plurality of cores. In various embodiments, the cooling apparatus comprises a plurality of channels embedded within the integrated chip and the cooling apparatus cools the integrated chip via flow of liquid coolant through the plurality of channels.
-
公开(公告)号:US10339015B2
公开(公告)日:2019-07-02
申请号:US15832251
申请日:2017-12-05
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Pradip Bose , Alper Buyuktosunoglu , Jingwen Leng , Ramon Bertran Monfort
Abstract: A computer-implemented method is provided that is performed in a computer having a processor and multiple co-processors. The method includes launching a same set of operations in each of an original co-processor and a redundant co-processor, from among the multiple co-processors, to obtain respective execution signatures from the original co-processor and the redundant co-processor. The method further includes detecting an error in an execution of the set of operations by the original co-processor, by comparing the respective execution signatures. The method also includes designating the execution of the set of operations by the original co-processor as error-free and committing a result of the execution, responsive to identifying a match between the respective execution signatures. The method additionally includes performing an error recovery operation that replays the set of operations by the original co-processor and the redundant co-processor, responsive to identifying a mismatch between the respective execution signatures.
-
公开(公告)号:US10331529B2
公开(公告)日:2019-06-25
申请号:US15459788
申请日:2017-03-15
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Pradip Bose , Alper Buyuktosunoglu , Jingwen Leng , Ramon Bertran Monfort
Abstract: A computer-implemented method is provided that is performed in a computer having a processor and multiple co-processors. The method includes launching a same set of operations in each of an original co-processor and a redundant co-processor, from among the multiple co-processors, to obtain respective execution signatures from the original co-processor and the redundant co-processor. The method further includes detecting an error in an execution of the set of operations by the original co-processor, by comparing the respective execution signatures. The method also includes designating the execution of the set of operations by the original co-processor as error-free and committing a result of the execution, responsive to identifying a match between the respective execution signatures. The method additionally includes performing an error recovery operation that replays the set of operations by the original co-processor and the redundant co-processor, responsive to identifying a mismatch between the respective execution signatures.
-
公开(公告)号:US20190146568A1
公开(公告)日:2019-05-16
申请号:US15814069
申请日:2017-11-15
Applicant: International Business Machines Corporation
Inventor: Pradip Bose , Alper Buyuktosunoglu , Pierce I-Jen Chuang , Phillip John Restle , Christos Vezyrtzis
Abstract: Techniques facilitating voltage management via on-chip sensors are provided. In one example, a computer-implemented method can comprise measuring, by a first processor core, power supply information. The computer-implemented method can also comprise measuring, by the first processor core, a value of an electrical current generated by the first processor core. Further, the computer-implemented method can comprise applying, by the first processor core, a mitigation technique at the first processor core in response to a determination that a combination of the power supply noise information and the value of the electrical current indicates a presence of a voltage noise at the first processor core.
-
公开(公告)号:US20180267868A1
公开(公告)日:2018-09-20
申请号:US15832251
申请日:2017-12-05
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Pradip Bose , Alper Buyuktosunoglu , Jingwen Leng , Ramon Bertran Monfort
IPC: G06F11/16
CPC classification number: G06F11/1658 , G06F11/1629 , G06F11/1641 , G06F2201/805 , G06F2201/82
Abstract: A computer-implemented method is provided that is performed in a computer having a processor and multiple co-processors. The method includes launching a same set of operations in each of an original co-processor and a redundant co-processor, from among the multiple co-processors, to obtain respective execution signatures from the original co-processor and the redundant co-processor. The method further includes detecting an error in an execution of the set of operations by the original co-processor, by comparing the respective execution signatures. The method also includes designating the execution of the set of operations by the original co-processor as error-free and committing a result of the execution, responsive to identifying a match between the respective execution signatures. The method additionally includes performing an error recovery operation that replays the set of operations by the original co-processor and the redundant co-processor, responsive to identifying a mismatch between the respective execution signatures.
-
公开(公告)号:US20180260225A1
公开(公告)日:2018-09-13
申请号:US15977075
申请日:2018-05-11
Applicant: International Business Machines Corporation
Inventor: Ramon Bertran , Pradip Bose , Alper Buyuktosunoglu , Timothy J. Slegel
CPC classification number: G06F9/30145 , G01R31/28 , G06F9/3005 , G06F9/44 , G06F9/455 , G06F11/00 , G06F11/3024 , G06F11/3414 , G06F11/3428 , G06F11/3457 , G06F13/10 , G06F13/12 , G06F15/7867 , G06F17/5009 , G06F17/5022
Abstract: One aspect is an analysis system that includes a processor operably coupled to a memory and configured to perform a method. The method includes defining a set of workloads for a targeted multi-core computer system based on a plurality of metrics of interest to profile. A plurality of workload-to-core mappings is generated for the workloads on the targeted multi-core computer system. The workloads run on the targeted multi-core computer system based on the workload-to-core mappings to produce a mapping of the workloads to the metrics of interest as experimental data. A statistical analysis is applied on the experimental data to define a plurality of metric profiles for the targeted multi-core computer system.
-
-
-
-
-
-
-
-
-