Patent search ap:("HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP") AND inv:"Cong Xu" Page 1

1.

发明授权
Systems and methods for intelligent data shuffling for high-performance distributed machine learning training 有权

公开(公告)号：US12050998B2

公开(公告)日：2024-07-30

申请号：US17010744

申请日：2020-09-02

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Sergey Serebryakov , Cong Xu

IPC: G06N3/084 , G06F11/34 , G06F18/214 , G06F18/241 , G06N3/045 , G06N3/08

CPC classification number: G06N3/084 , G06F11/3433 , G06F18/214 , G06F18/241 , G06N3/045 , G06N3/08

Abstract: Systems and methods are provided for data shuffling for distributed machine learning training, including each training node in the network receiving a shard of training data, wherein the training data set is divided into shards having data items. Each data item is assigned to a working set such that each of the working set includes data items from multiple shards. The training nodes perform training using the data items of a first working set that are in each node's shard. Upon completion of the training using the data items of the first working set, the training nodes performing training using the data items of a second working set that are in their shards; and while the training nodes are performing training on their respective subsets of shards of the second working set, the nodes randomly shuffling data items in the first working set to create a shuffled first working set.

2.

发明申请
SYSTEMS AND METHODS FOR DATA-AWARE STORAGE TIERING FOR DEEP LEARNING 有权

公开(公告)号：US20220327376A1

公开(公告)日：2022-10-13

申请号：US17226917

申请日：2021-04-09

Applicant: Hewlett Packard Enterprise Development LP

Inventor： Cong Xu , Suparna Bhattacharya , Paolo Faraboschi

IPC: G06N3/08 , G06F16/21 , G06N3/063 , G06N3/04

Abstract: Systems and methods are configured to split an epoch associated with a training dataset into a plurality of mini-epochs. A machine learning model can be trained with a mini-epoch of the plurality of mini-epochs. The mini-epoch can be, during the training, iterated for a number of times during the training. One or more metrics reflective of at least one of: a training loss, training accuracy, or validation accuracy of the machine learning model associated with the mini-epoch can be received. Whether to terminate iterations of the mini-epoch early before a number of iterations of the mini-epoch reaches the number of times based on the one or more metrics can be determined. The number of iterations can be a non-zero number.

3.

发明申请
MEMORY SIDE ACCELERATOR THREAD ASSIGNMENTS 审中-公开

公开(公告)号：US20180285011A1

公开(公告)日：2018-10-04

申请号：US15476185

申请日：2017-03-31

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Kaisheng Ma , Qiong Cai , Cong Xu , Paolo Faraboschi

IPC: G06F3/06

CPC classification number: G06F3/0631 , G06F3/061 , G06F3/0683 , G06F9/4881 , G06F9/5044 , G06F15/7821

Abstract: Examples described herein include receiving an operation pipeline for a computing system and building a graph that comprises a model for a number of potential memory side accelerator thread assignments to carry out the operation pipeline. The computing system may comprise at least two memories and a number of memory side accelerators. Each model may comprise a number of steps and at least one step out of the number of steps in each model may comprise a function performed at one memory side accelerator out of the number of memory side accelerators. Examples described herein also include determining a cost of at least one model.

4.

发明申请
REALLOCATE MEMORY PENDING QUEUE BASED ON STALL 审中-公开

公开(公告)号：US20170371561A1

公开(公告)日：2017-12-28

申请号：US15190276

申请日：2016-06-23

Applicant: Hewlett Packard Enterprise Development LP

Inventor： Qiong Cai , Paolo Faraboschi , Cong Xu , Ping Chi , Sai Rahul Chalamalasetti , Andrew C. Walton

IPC: G06F3/06

CPC classification number: G06F9/38 , G06F13/16

Abstract: Techniques for reallocating a memory pending queue based on stalls are provided. In one aspect, it may be determined at a memory stop of a memory fabric that at least one class of memory access is stalled. It may also be determined at the memory stop of the memory fabric that there is at least one class of memory access that is not stalled. At least a portion of a memory pending queue may be reallocated from the class of memory access that is not stalled to the class of memory access that is stalled.

5.

发明申请
PROACTIVE CLUSTER COMPUTE NODE MIGRATION AT NEXT CHECKPOINT OF CLUSTER UPON PREDICTED NODE FAILURE 审中-公开

公开(公告)号：US20200379858A1

公开(公告)日：2020-12-03

申请号：US16994784

申请日：2020-08-17

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Cong Xu , Naveen Muralimanohar , Harumi Kuno

IPC: G06F11/20 , G06F11/14 , G06F11/07 , G06F11/00 , G06F11/36 , G06F9/48 , G06F9/52 , G06F9/54 , G06F9/455 , G06N20/00

Abstract: While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes.

6.

发明授权
Proactive cluster compute node migration at next checkpoint of cluster cluster upon predicted node failure 有权

公开(公告)号：US10776225B2

公开(公告)日：2020-09-15

申请号：US16022990

申请日：2018-06-29

Applicant: Hewlett Packard Enterprise Development LP

Inventor： Cong Xu , Naveen Muralimanohar , Harumi Kuno

IPC: G06F11/20 , G06F11/14 , G06F11/07 , G06F11/00 , G06F11/36 , G06F9/48 , G06F9/52 , G06F9/54 , G06F9/455 , G06N20/00

Abstract: While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes.

7.

发明申请
DYNAMIC DATA COMPRESSIONS 审中-公开

公开(公告)号：US20190238154A1

公开(公告)日：2019-08-01

申请号：US15885277

申请日：2018-01-31

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Naveen Muralimanohar , Cong Xu , Gregg B. Lesartre

IPC: H03M7/30 , H04L12/811 , H04L29/06 , H04L29/08

CPC classification number: H03M7/30 , G06F11/1448 , H04L47/38 , H04L67/10 , H04L67/2828 , H04L69/04

Abstract: In some examples, a system performs a dynamic compression adaptation process that includes dynamically adjusting a compression algorithm used for performing data compression, and a location within an arrangement of different types of nodes at which the data compression is performed. Dynamically adjusting the compression algorithm and the location comprises selecting from among a plurality of different compression algorithms and from among locations at different nodes of the different types of nodes based on a state of the arrangement of different types of nodes and a characteristic of a workload for which the data compression is performed.

8.

发明授权
Sparsifying neural network models 有权

公开(公告)号：US11645529B2

公开(公告)日：2023-05-09

申请号：US15967835

申请日：2018-05-01

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Sicheng Li , Cong Xu , Tsung Ching Huang

IPC: G06V10/82 , G06N3/082 , G06N3/04 , G06N3/063

CPC classification number: G06N3/082 , G06N3/04 , G06N3/063

Abstract: A technique includes modifying a neural network model to sparsify the model. The model includes a plurality of kernel element weights, which are parameterized according to a plurality of dimensions. Modifying the model includes, in a given iteration of the plurality of iterations, training the model based on a structure regularization in which kernel element weights that share a dimension in common are removed as a group to create corresponding zero kernel elements in the model; and compressing the model to exclude zero kernel element weights from the model to prepare the model to be trained in another iteration.

9.

发明申请
SYSTEMS AND METHODS FOR INTELLIGENT DATA SHUFFLING FOR HIGH-PERFORMANCE DISTRIBUTED MACHINE LEARNING TRAINING 有权

公开(公告)号：US20220067577A1

公开(公告)日：2022-03-03

申请号：US17010744

申请日：2020-09-02

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Sergey Serebryakov , Cong Xu

IPC: G06N20/00

Abstract: Systems and methods are provided for data shuffling for distributed machine learning training, including each training node in the network receiving a shard of training data, wherein the training data set is divided into shards having data items. Each data item is assigned to a working set such that each of the working set includes data items from multiple shards. The training nodes perform training using the data items of a first working set that are in each node's shard. Upon completion of the training using the data items of the first working set, the training nodes performing training using the data items of a second working set that are in their shards; and while the training nodes are performing training on their respective subsets of shards of the second working set, the nodes randomly shuffling data items in the first working set to create a shuffled first working set.

10.

发明授权
Memory side acceleration for deep learning parameter updates 有权

公开(公告)号：US10810492B2

公开(公告)日：2020-10-20

申请号：US15417760

申请日：2017-01-27

Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventor： Cong Xu , Qiong Cai

IPC: G06F3/06 , G06N3/08 , G06F13/16 , G06N3/04 , G06N3/063 , G06F9/48

Abstract: Examples disclosed herein relate to using a memory side accelerator to calculate updated deep learning parameters. A globally addressable memory includes deep learning parameters. The deep learning parameters are partitioned, where each partition is associated with a memory side accelerator. A memory side accelerator is to receive calculated gradient updates associated with its partition and calculate an update to the deep learning parameters associated with the partition.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification