Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Leonardo Piga"

11.

发明授权
Managing variations among nodes in parallel system frameworks 有权

公开(公告)号：US10355966B2

公开(公告)日：2019-07-16

申请号：US15081558

申请日：2016-03-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Samuel Lawrence Wasmundt , Leonardo Piga , Indrani Paul , Wei Huang , Manish Arora

IPC: H04L12/26 , H04L29/08

Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

12.

发明授权
Balancing computation and communication power in power constrained clusters 有权

公开(公告)号：US09983652B2

公开(公告)日：2018-05-29

申请号：US14959669

申请日：2015-12-04

Applicant: Advanced Micro Devices, Inc.

Inventor： Leonardo Piga , Indrani Paul , Wei Huang

IPC: G06F1/26 , G06F1/32

CPC classification number: G06F1/3203 , G06F1/3206 , G06F1/3287 , Y02D10/171

Abstract: Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agent may be configured to reassign the unused power to the active nodes to expedite workload processing.

13.

发明申请
ACHIEVING BALANCED EXECUTION THROUGH RUNTIME DETECTION OF PERFORMANCE VARIATION 审中-公开

公开(公告)号：US20170373955A1

公开(公告)日：2017-12-28

申请号：US15192764

申请日：2016-06-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Brian J. Kocoloski , Leonardo Piga , Wei Huang , Indrani Paul

IPC: H04L12/26

CPC classification number: G06F11/30 , G06F9/4893 , G06F2209/5019 , Y02D10/24

Abstract: Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.

14.

发明授权
Bandwidth-aware multi-frequency performance estimation mechanism 有权

公开(公告)号：US10048741B1

公开(公告)日：2018-08-14

申请号：US15416993

申请日：2017-01-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Md Abdullah Shahneous Bari , Leonardo Piga , Indrani Paul

IPC: G06F12/00 , G06F1/32 , G06F3/06

Abstract: Systems, apparatuses, and methods for implementing performance estimation mechanisms are disclosed. In one embodiment, a computing system includes at least one processor and a memory subsystem. During a characterization phase, the system utilizes a memory intensive workload to detect when the memory subsystem reaches its saturation point. Then, the system collects performance counter values during a sampling phase of a target application to determine the memory bandwidth. If the memory bandwidth is greater than the saturation point, then the system generates a prediction of the memory time which is based on a ratio of the memory bandwidth over the saturation point. Otherwise, if the memory bandwidth is less than the saturation point, the system assumes memory time is constant versus processor frequency. Then, the system uses the memory time and an estimate of the compute time to estimate a phase time for the target application at different processor frequencies.

15.

发明申请
PAGE MIGRATION ACCELERATION USING A TWO-LEVEL BLOOM FILTER ON HIGH BANDWIDTH MEMORY SYSTEMS 审中-公开

公开(公告)号：US20180081585A1

公开(公告)日：2018-03-22

申请号：US15269289

申请日：2016-09-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Leonardo Piga , Mauricio Breternitz

IPC: G06F3/06 , G06F17/30

CPC classification number: G06F3/0647 , G06F3/0614 , G06F3/0638 , G06F3/0653 , G06F3/0683 , G06F12/08 , G06F17/30949 , G06F17/30979 , G06F2212/1024

Abstract: Systems, apparatuses, and methods for accelerating page migration using a two-level bloom filter are disclosed. In one embodiment, a system includes a GPU and a CPU and a multi-level memory hierarchy. When a memory request misses in a first memory, the GPU is configured to check a first level of a two-level bloom filter to determine if a page targeted by the memory request is located in a second memory. If the first level of the two-level bloom filter indicates that the page is not in the second memory, then the GPU generates a page fault and sends the memory request to a third memory. If the first level of the two-level bloom filter indicates that the page is in the second memory, then the GPU sends the memory request to the CPU.

16.

发明申请
REAL-TIME PERFORMANCE TRACKING USING DYNAMIC COMPILATION 审中-公开

公开(公告)号：US20170371761A1

公开(公告)日：2017-12-28

申请号：US15192748

申请日：2016-06-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Leonardo Piga , Brian J. Kocoloski , Wei Huang , Abhinandan Majumdar , Indrani Paul

IPC: G06F11/36 , G06F9/45

CPC classification number: G06F11/3604 , G06F9/45516

Abstract: Systems, apparatuses, and methods for performing real-time tracking of performance targets using dynamic compilation. A performance target is specified in a service level agreement. A dynamic compiler analyzes a software application executing in real-time and determine which high-level application metrics to track. The dynamic compiler then inserts instructions into the code to increment counters associated with the metrics. A power optimization unit then utilizes the counters to determine if the system is currently meeting the performance target. If the system is exceeding the performance target, then the power optimization unit reduces the power consumption of the system while still meeting the performance target.

17.

发明申请
MANAGING CLUSTER-LEVEL PERFORMANCE VARIABILITY WITHOUT A CENTRALIZED CONTROLLER 审中-公开

公开(公告)号：US20170366412A1

公开(公告)日：2017-12-21

申请号：US15183625

申请日：2016-06-15

Applicant: Advanced Micro Devices, Inc.

Inventor： Leonardo Piga

IPC: H04L12/24 , H04L29/08 , H04L12/911

CPC classification number: H04L67/10 , G06F9/00 , H04L67/1008 , H04L67/1029 , H04L67/28

Abstract: Systems, apparatuses, and methods for managing cluster-level performance variability without a centralized controller are described. Each node of a multi-node cluster tracks a maximum and minimum progress across the plurality of nodes for a workload executed by the cluster. Each node also tracks its local progress on its current task. Each node also utilizes a comparison of the local progress to reported maximum and minimum progress across the cluster to identify a critical, or slow, node and whether to increase or reduce an amount of power allocated to the node. The nodes append information about the maximum and minimum progress to messages sent to other nodes to report their knowledge of maximum and minimum progress with other nodes. A node updates its local information if the node receives a message from another node with more up-to-date information about the state of progress across the cluster.

18.

发明申请
MANAGING VARIATIONS AMONG NODES IN PARALLEL SYSTEM FRAMEWORKS 审中-公开

公开(公告)号：US20170279703A1

公开(公告)日：2017-09-28

申请号：US15081558

申请日：2016-03-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Samuel Lawrence Wasmundt , Leonardo Piga , Indrani Paul , Wei Huang , Manish Arora

IPC: H04L12/26 , H04L29/08

CPC classification number: H04L43/16 , H04L43/08 , H04L67/10 , H04L67/1008

Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

19.

发明申请
STORAGE LOCATION ASSIGNMENT AT A CLUSTER COMPUTE SERVER 审中-公开
Title translation: 存储位置分配在一个集群计算机服务器

公开(公告)号：US20160173589A1

公开(公告)日：2016-06-16

申请号：US14568181

申请日：2014-12-12

Applicant: Advanced Micro Devices, Inc.

Inventor： Mauricio Breternitz, JR. , Leonardo Piga

IPC: H04L29/08 , G06F3/06

CPC classification number: G06F3/0683 , G06F3/0607 , G06F3/0619 , G06F3/0635 , G06F3/067 , G06F8/00 , G06F11/00 , G06F2009/45579 , H04L67/1097

Abstract: A cluster compute server stores different types of data at different storage volumes in order to reduce data duplication at the storage volumes. The storage volumes are categorized into two classes: common storage volumes and dedicated storage volumes, wherein the common storage volumes store data to be accessed and used by multiple compute nodes (or multiple virtual servers) of the cluster compute server. The dedicated storage volumes, in contrast, store data to be accessed only by a corresponding compute node (or virtual server).

Abstract translation: 集群计算服务器在不同的存储卷中存储不同类型的数据，以减少存储卷上的数据重复。存储卷分为两类：常用存储卷和专用存储卷，其中公共存储卷存储要由群集计算服务器的多个计算节点（或多个虚拟服务器）访问和使用的数据。相比之下，专用存储卷存储仅由对应的计算节点（或虚拟服务器）访问的数据。

20.

发明授权
Thread assignment for power and performance efficiency using multiple power states 有权
Title translation: 使用多个电源状态进行功率和性能效率的线程分配

公开(公告)号：US09170854B2

公开(公告)日：2015-10-27

申请号：US13909789

申请日：2013-06-04

Applicant: Advanced Micro Devices, Inc.

Inventor： Mauricio Breternitz , Leonardo Piga

IPC: G06F9/50 , G06F9/48 , G06F1/32

CPC classification number: G06F9/5094 , G06F1/32 , G06F1/329 , G06F9/4893 , G06F9/5088 , Y02D10/22 , Y02D10/24 , Y02D10/32

Abstract: A method is performed in a computing system that includes a plurality of processing nodes of multiple types configurable to run in multiple performance states. In the method, an application executes on a thread assigned to a first processing node. Power and performance of the application on the first processing node is estimated. Power and performance of the application in multiple performance states on other processing nodes of the plurality of processing nodes besides the first processing node is also estimated. It is determined that the estimated power and performance of the application on a second processing node in a respective performance state of the multiple performance states is preferable to the power and performance of the application on the first processing node. The thread is reassigned to the second processing node, with the second processing node in the respective performance state.

Abstract translation: 在计算系统中执行一种方法，该计算系统包括多个可配置为以多个执行状态运行的多个处理节点。在该方法中，应用程序在分配给第一处理节点的线程上执行。估计第一处理节点上的应用的功率和性能。还估计除了第一处理节点之外的多个处理节点的其他处理节点上的多个性能状态下的应用的功率和性能。确定在多个性能状态的各个性能状态下的第二处理节点上的应用的估计功率和性能优于第一处理节点上的应用的功率和性能。线程被重新分配给第二处理节点，其中第二处理节点处于相应的执行状态。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification