-
公开(公告)号:US12050998B2
公开(公告)日:2024-07-30
申请号:US17010744
申请日:2020-09-02
发明人: Sergey Serebryakov , Cong Xu
IPC分类号: G06N3/084 , G06F11/34 , G06F18/214 , G06F18/241 , G06N3/045 , G06N3/08
CPC分类号: G06N3/084 , G06F11/3433 , G06F18/214 , G06F18/241 , G06N3/045 , G06N3/08
摘要: Systems and methods are provided for data shuffling for distributed machine learning training, including each training node in the network receiving a shard of training data, wherein the training data set is divided into shards having data items. Each data item is assigned to a working set such that each of the working set includes data items from multiple shards. The training nodes perform training using the data items of a first working set that are in each node's shard. Upon completion of the training using the data items of the first working set, the training nodes performing training using the data items of a second working set that are in their shards; and while the training nodes are performing training on their respective subsets of shards of the second working set, the nodes randomly shuffling data items in the first working set to create a shuffled first working set.
-
公开(公告)号:US11881261B2
公开(公告)日:2024-01-23
申请号:US17555260
申请日:2021-12-17
CPC分类号: G11C13/004 , G06N7/01 , G11C13/0026 , G11C13/0028 , G11C13/0069 , G11C15/04
摘要: Systems and methods are provided for employing analog content addressable memory (aCAMs) to achieve low latency complex distribution sampling. For example, an aCAM core circuit can include an aCAM array. Amplitudes of a probability distribution function are mapped to a width of one or more aCAM cells in each row of the aCAM array. The aCAM core circuit can also include a resistive random access memory (RRAM) storing lookup information, such as information used for processing a model. By randomly selecting columns to search of the aCAM array, the mapped probability distribution function is sampled in a manner that has low latency. The aCAM core circuit can accelerate the sampling step in methods relying on sampling from arbitrary probability distributions, such as particle filter techniques. A hardware architecture for an aCAM Particle Filter that utilizes the aCAM core circuit as a central structure is also described.
-
公开(公告)号:US20230344851A1
公开(公告)日:2023-10-26
申请号:US18343877
申请日:2023-06-29
IPC分类号: H04L9/40 , H04L43/0817 , H04L43/065 , H04L43/062 , H04L43/04
CPC分类号: H04L63/1425 , H04L63/20 , H04L43/0817 , H04L43/065 , H04L43/062 , H04L43/04
摘要: An example device includes processing circuitry and a memory. The memory includes instructions that cause the device to perform various functions. The functions include receiving datastreams from a plurality of sensors of a high performance computing system, classifying each datastream of the each sensor to one of a plurality of datastream models, selecting an anomaly detection algorithm from a plurality of anomaly detection algorithms for each datastream, determining parameters of the each anomaly detection algorithm, determining an anomaly threshold for each datastream, and generating an indication that the sensor associated with the datastream is acting anomalously.
-
公开(公告)号:US11644882B2
公开(公告)日:2023-05-09
申请号:US17337107
申请日:2021-06-02
发明人: Harumi Kuno , Alan Davis , Torsten Wilde , Daniel William Dauwe , Duncan Roweth , Ryan Dean Menhusen , Sergey Serebryakov , John L. Byrne , Vipin Kumar Kukkala , Sai Rahul Chalamalasetti
IPC分类号: G06F1/3206 , G06F1/30 , H02J3/00 , G06F1/18
CPC分类号: G06F1/305 , G06F1/188 , G06F1/3206 , H02J3/003
摘要: One embodiment provides a system and method for predicting network power usage associated with workloads. During operation, the system configures a simulator to simulate operations of a plurality of network components, which comprises embedding one or more event counters in each simulated network component. A respective event counter is configured to count a number of network-power-related events. The system collects, based on values of the event counters, network-power-related performance data associated with one or more sample workloads applied to the simulator; and trains a machine-learning model with the collected network-power-related performance data and characteristics of the sample workloads as training data 1, thereby facilitating prediction of network-power-related performance associated with a to-be-evaluated workload.
-
公开(公告)号:US11544540B2
公开(公告)日:2023-01-03
申请号:US16409729
申请日:2019-05-10
摘要: Systems and methods are provided for implementing hardware optimization for a hardware accelerator. The hardware accelerator emulates a neural network. Training of the neural network integrates a regularized pruning technique to systematically reduce a number of weights. A crossbar array included in hardware accelerator can be programmed to calculate node values of the pruned neural network to selectively reduce the number of weight column lines in the crossbar array. During deployment, the hardware accelerator can be programmed to power off periphery circuit elements that correspond to a pruned weight column line to optimize the hardware accelerator for power. Alternatively, before deployment, the hardware accelerator can be optimized for area by including a finite number of weight column line. Then, regularized pruning of the neural network selectively reduces the number of weights for consistency with the finite number of weight columns lines in the hardware accelerator.
-
6.
公开(公告)号:US11294763B2
公开(公告)日:2022-04-05
申请号:US16115100
申请日:2018-08-28
发明人: John Paul Strachan , Catherine Graves , Dejan S. Milojicic , Paolo Faraboschi , Martin Foltin , Sergey Serebryakov
摘要: A computer system includes multiple memory array components that include respective analog memory arrays which are sequenced to implement a multi-layer process. An error array data structure is obtained for at least a first memory array component, and from which a determination is made as to whether individual nodes (or cells) of the error array data structure are significant. A determination can be made as to any remedial operations that can be performed to mitigate errors of significance.
-
公开(公告)号:US11182134B2
公开(公告)日:2021-11-23
申请号:US16799637
申请日:2020-02-24
摘要: Systems and methods are provided for optimizing parameters of a system across an entire stack, including algorithms layer, toolchain layer, execution or runtime layer, and hardware layer. Results from the layer-specific optimization functions of each domain can be consolidated using one or more consolidation optimization functions to consolidate the layer-specific optimization results, capturing the relationship between the different layers of the stack. Continuous monitoring of the programming model during execution may be implemented and can enable the programming model to self-adjust based on real-time performance metrics. In this way, programmers and system administrators are relieved of the need for domain knowledge and are offered a systematic way for continuous optimization (rather than an ad hoc approach).
-
8.
公开(公告)号:US20230418792A1
公开(公告)日:2023-12-28
申请号:US17851546
申请日:2022-06-28
发明人: Annmary Justine KOOMTHANAM , Suparna Bhattacharya , Aalap Tripathy , Sergey Serebryakov , Martin Foltin , Paolo Faraboschi
IPC分类号: G06F16/215 , G06F16/25 , G06F16/27 , G06N20/00 , G06K9/62
CPC分类号: G06F16/215 , G06F16/254 , G06F16/27 , G06N20/00 , G06K9/6256
摘要: Systems and methods are provide for automatically constructing data lineage representations for distributed data processing pipelines. These data lineage representations (which are constructed and stored in a central repository shared by the multiple data processing sites) can be used to among other things, clone the distributed data processing pipeline for quality assurance or debugging purposes. Examples of the presently disclosed technology are able to construct data lineage representations for distributed data processing pipelines by (1) generating a hash content value for universally identifying each data artifact of the distributed data processing pipeline across the multiple processing stages/processing sites of the distributed data processing pipeline; and (2) creating an data processing pipeline abstraction hierarchy for associating each data artifact to input and output events for given executions of given data processing stages (performed by the multiple data processing sites).
-
公开(公告)号:US20220121885A1
公开(公告)日:2022-04-21
申请号:US17074201
申请日:2020-10-19
摘要: Testing for bias in a machine learning (ML) model in a manner that is independent of the code/weights deployment path is described. If bias is detected, an alert for bias is generated, and optionally, the ML model can be incrementally re-trained to mitigate the detected bias. Re-training the ML model to mitigate the bias may include enforcing a bias cost function to maintain a level of bias in the ML model below a threshold bias level. One or more statistical metrics representing the level of bias present in the ML model may be determined and compared against one or more threshold values. If one or more metrics exceed corresponding threshold value(s), the level of bias in the ML model may be deemed to exceed a threshold level of bias, and re-training of the ML model to mitigate the bias may be initiated.
-
公开(公告)号:US20210263713A1
公开(公告)日:2021-08-26
申请号:US16799637
申请日:2020-02-24
摘要: Systems and methods are provided for optimizing parameters of a system across an entire stack, including algorithms layer, toolchain layer, execution or runtime layer, and hardware layer. Results from the layer-specific optimization functions of each domain can be consolidated using one or more consolidation optimization functions to consolidate the layer-specific optimization results, capturing the relationship between the different layers of the stack. Continuous monitoring of the programming model during execution may be implemented and can enable the programming model to self-adjust based on real-time performance metrics. In this way, programmers and system administrators are relieved of the need for domain knowledge and are offered a systematic way for continuous optimization (rather than an ad hoc approach).
-
-
-
-
-
-
-
-
-