-
公开(公告)号:US12050998B2
公开(公告)日:2024-07-30
申请号:US17010744
申请日:2020-09-02
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Sergey Serebryakov , Cong Xu
IPC: G06N3/084 , G06F11/34 , G06F18/214 , G06F18/241 , G06N3/045 , G06N3/08
CPC classification number: G06N3/084 , G06F11/3433 , G06F18/214 , G06F18/241 , G06N3/045 , G06N3/08
Abstract: Systems and methods are provided for data shuffling for distributed machine learning training, including each training node in the network receiving a shard of training data, wherein the training data set is divided into shards having data items. Each data item is assigned to a working set such that each of the working set includes data items from multiple shards. The training nodes perform training using the data items of a first working set that are in each node's shard. Upon completion of the training using the data items of the first working set, the training nodes performing training using the data items of a second working set that are in their shards; and while the training nodes are performing training on their respective subsets of shards of the second working set, the nodes randomly shuffling data items in the first working set to create a shuffled first working set.
-
公开(公告)号:US20220327376A1
公开(公告)日:2022-10-13
申请号:US17226917
申请日:2021-04-09
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Cong Xu , Suparna Bhattacharya , Paolo Faraboschi
Abstract: Systems and methods are configured to split an epoch associated with a training dataset into a plurality of mini-epochs. A machine learning model can be trained with a mini-epoch of the plurality of mini-epochs. The mini-epoch can be, during the training, iterated for a number of times during the training. One or more metrics reflective of at least one of: a training loss, training accuracy, or validation accuracy of the machine learning model associated with the mini-epoch can be received. Whether to terminate iterations of the mini-epoch early before a number of iterations of the mini-epoch reaches the number of times based on the one or more metrics can be determined. The number of iterations can be a non-zero number.
-
公开(公告)号:US20180285011A1
公开(公告)日:2018-10-04
申请号:US15476185
申请日:2017-03-31
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Kaisheng Ma , Qiong Cai , Cong Xu , Paolo Faraboschi
IPC: G06F3/06
CPC classification number: G06F3/0631 , G06F3/061 , G06F3/0683 , G06F9/4881 , G06F9/5044 , G06F15/7821
Abstract: Examples described herein include receiving an operation pipeline for a computing system and building a graph that comprises a model for a number of potential memory side accelerator thread assignments to carry out the operation pipeline. The computing system may comprise at least two memories and a number of memory side accelerators. Each model may comprise a number of steps and at least one step out of the number of steps in each model may comprise a function performed at one memory side accelerator out of the number of memory side accelerators. Examples described herein also include determining a cost of at least one model.
-
公开(公告)号:US20170371561A1
公开(公告)日:2017-12-28
申请号:US15190276
申请日:2016-06-23
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Qiong Cai , Paolo Faraboschi , Cong Xu , Ping Chi , Sai Rahul Chalamalasetti , Andrew C. Walton
IPC: G06F3/06
Abstract: Techniques for reallocating a memory pending queue based on stalls are provided. In one aspect, it may be determined at a memory stop of a memory fabric that at least one class of memory access is stalled. It may also be determined at the memory stop of the memory fabric that there is at least one class of memory access that is not stalled. At least a portion of a memory pending queue may be reallocated from the class of memory access that is not stalled to the class of memory access that is stalled.
-
5.
公开(公告)号:US20200379858A1
公开(公告)日:2020-12-03
申请号:US16994784
申请日:2020-08-17
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Cong Xu , Naveen Muralimanohar , Harumi Kuno
IPC: G06F11/20 , G06F11/14 , G06F11/07 , G06F11/00 , G06F11/36 , G06F9/48 , G06F9/52 , G06F9/54 , G06F9/455 , G06N20/00
Abstract: While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes.
-
公开(公告)号:US10776225B2
公开(公告)日:2020-09-15
申请号:US16022990
申请日:2018-06-29
Applicant: Hewlett Packard Enterprise Development LP
Inventor: Cong Xu , Naveen Muralimanohar , Harumi Kuno
IPC: G06F11/20 , G06F11/14 , G06F11/07 , G06F11/00 , G06F11/36 , G06F9/48 , G06F9/52 , G06F9/54 , G06F9/455 , G06N20/00
Abstract: While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes.
-
公开(公告)号:US20190238154A1
公开(公告)日:2019-08-01
申请号:US15885277
申请日:2018-01-31
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Naveen Muralimanohar , Cong Xu , Gregg B. Lesartre
IPC: H03M7/30 , H04L12/811 , H04L29/06 , H04L29/08
CPC classification number: H03M7/30 , G06F11/1448 , H04L47/38 , H04L67/10 , H04L67/2828 , H04L69/04
Abstract: In some examples, a system performs a dynamic compression adaptation process that includes dynamically adjusting a compression algorithm used for performing data compression, and a location within an arrangement of different types of nodes at which the data compression is performed. Dynamically adjusting the compression algorithm and the location comprises selecting from among a plurality of different compression algorithms and from among locations at different nodes of the different types of nodes based on a state of the arrangement of different types of nodes and a characteristic of a workload for which the data compression is performed.
-
公开(公告)号:US11645529B2
公开(公告)日:2023-05-09
申请号:US15967835
申请日:2018-05-01
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Sicheng Li , Cong Xu , Tsung Ching Huang
Abstract: A technique includes modifying a neural network model to sparsify the model. The model includes a plurality of kernel element weights, which are parameterized according to a plurality of dimensions. Modifying the model includes, in a given iteration of the plurality of iterations, training the model based on a structure regularization in which kernel element weights that share a dimension in common are removed as a group to create corresponding zero kernel elements in the model; and compressing the model to exclude zero kernel element weights from the model to prepare the model to be trained in another iteration.
-
公开(公告)号:US20220067577A1
公开(公告)日:2022-03-03
申请号:US17010744
申请日:2020-09-02
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Sergey Serebryakov , Cong Xu
IPC: G06N20/00
Abstract: Systems and methods are provided for data shuffling for distributed machine learning training, including each training node in the network receiving a shard of training data, wherein the training data set is divided into shards having data items. Each data item is assigned to a working set such that each of the working set includes data items from multiple shards. The training nodes perform training using the data items of a first working set that are in each node's shard. Upon completion of the training using the data items of the first working set, the training nodes performing training using the data items of a second working set that are in their shards; and while the training nodes are performing training on their respective subsets of shards of the second working set, the nodes randomly shuffling data items in the first working set to create a shuffled first working set.
-
公开(公告)号:US10810492B2
公开(公告)日:2020-10-20
申请号:US15417760
申请日:2017-01-27
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Abstract: Examples disclosed herein relate to using a memory side accelerator to calculate updated deep learning parameters. A globally addressable memory includes deep learning parameters. The deep learning parameters are partitioned, where each partition is associated with a memory side accelerator. A memory side accelerator is to receive calculated gradient updates associated with its partition and calculate an update to the deep learning parameters associated with the partition.
-
-
-
-
-
-
-
-
-