-
公开(公告)号:US10355966B2
公开(公告)日:2019-07-16
申请号:US15081558
申请日:2016-03-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Samuel Lawrence Wasmundt , Leonardo Piga , Indrani Paul , Wei Huang , Manish Arora
Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.
-
公开(公告)号:US20190123648A1
公开(公告)日:2019-04-25
申请号:US16130136
申请日:2018-09-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang , Yasuko Eckert , Xudong An , Muhammad Shoaib Bin Altaf , Jieming Yin
CPC classification number: H02M3/1582 , G05F1/468 , G05F1/56 , G05F1/575 , H02M2001/0022
Abstract: The described embodiments include an apparatus that controls voltages for an integrated circuit chip having a set of circuits. The apparatus includes a switching voltage regulator separate from the integrated circuit chip and two or more low dropout (LDO) regulators fabricated on the integrated circuit chip. The switching voltage regulator provides an output voltage that is received as an input voltage by each of the two or more LDO regulators, and each of the two or more LDO regulators provides a local output voltage, each local output voltage received as a local input voltage by a different subset of the circuits in the set of circuits. During operation, a controller sets an operating point for each of the subsets of circuits based on a combined power efficiency for the subsets of the circuits and the LDO regulators, each operating point including a corresponding frequency and voltage.
-
13.
公开(公告)号:US10210912B2
公开(公告)日:2019-02-19
申请号:US15618349
申请日:2017-06-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang
IPC: G11C7/04 , H01L27/108 , H01L23/38 , H01L35/32 , H01L27/16
Abstract: Managing temperature of a semiconductor device having a temperature inverted processor core and stacked memory by operation of an integrated thermoelectric cooler. The thermoelectric cooler is operated to pump heat from a stacked memory device that requires a cool operating temperature to a temperature inverted processor core that maintains a higher operating temperature until threshold operating temperatures are achieved.
-
公开(公告)号:US09990203B2
公开(公告)日:2018-06-05
申请号:US14981310
申请日:2015-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Leonardo de Paula Rosa Piga , Abhinandan Majumdar , Indrani Paul , Wei Huang , Manish Arora , Joseph L. Greathouse
IPC: G06F9/30
CPC classification number: G06F9/30192 , G06F9/30014 , G06F9/30083 , G06F9/30145 , G06F11/00
Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.
-
公开(公告)号:US09983652B2
公开(公告)日:2018-05-29
申请号:US14959669
申请日:2015-12-04
Applicant: Advanced Micro Devices, Inc.
Inventor: Leonardo Piga , Indrani Paul , Wei Huang
CPC classification number: G06F1/3203 , G06F1/3206 , G06F1/3287 , Y02D10/171
Abstract: Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agent may be configured to reassign the unused power to the active nodes to expedite workload processing.
-
公开(公告)号:US20170373955A1
公开(公告)日:2017-12-28
申请号:US15192764
申请日:2016-06-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Brian J. Kocoloski , Leonardo Piga , Wei Huang , Indrani Paul
IPC: H04L12/26
CPC classification number: G06F11/30 , G06F9/4893 , G06F2209/5019 , Y02D10/24
Abstract: Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.
-
公开(公告)号:US20170357509A1
公开(公告)日:2017-12-14
申请号:US15181837
申请日:2016-06-14
Applicant: Advanced Micro Devices, Inc.
Inventor: Akanksha Jain , Wei Huang , Indrani Paul
CPC classification number: G06F9/30181 , G06F1/3228 , G06F1/324 , G06F1/3243 , G06F1/3296 , G06F11/30
Abstract: A system-on-a-chip includes a plurality of instruction processors and a hardware block such as a system management unit. The hardware block accesses values of performance counters associated with the plurality of instruction processors and modifies one or more operating points of one or more of the plurality of instruction processors based on comparisons of the instruction arrival rates and the instruction service rates to achieve optimized system metrics.
-
公开(公告)号:US09658663B2
公开(公告)日:2017-05-23
申请号:US14862044
申请日:2015-09-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang , Manish Arora , Yasuko Eckert , Indrani Paul
CPC classification number: G06F1/206 , G06F1/3206 , G06F1/3234 , G06F1/324 , G06F1/3296 , G06F11/3024 , G06F11/3058
Abstract: A three-dimensional (3-D) processor stack includes a plurality of processor cores implemented in a plurality of layers. A controller is to selectively throttle one or more of a plurality of processor cores in response to detecting a thermal event. The controller selectively throttles the one or more of the plurality of processor cores based on values of thermal couplings between the plurality of layers and based on measures of criticality of threads executing on the plurality of processor cores.
-
公开(公告)号:US10133574B2
公开(公告)日:2018-11-20
申请号:US15181837
申请日:2016-06-14
Applicant: Advanced Micro Devices, Inc.
Inventor: Akanksha Jain , Wei Huang , Indrani Paul
Abstract: A system-on-a-chip includes a plurality of instruction processors and a hardware block such as a system management unit. The hardware block accesses values of performance counters associated with the plurality of instruction processors and modifies one or more operating points of one or more of the plurality of instruction processors based on comparisons of the instruction arrival rates and the instruction service rates to achieve optimized system metrics.
-
20.
公开(公告)号:US09965343B2
公开(公告)日:2018-05-08
申请号:US14710879
申请日:2015-05-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Rathijit Sen , Indrani Paul , Wei Huang
CPC classification number: G06F9/545 , G06F9/44505 , Y02D10/43
Abstract: Disclosed is a method of determining concurrency factors for an application running on a parallel processor. Also disclosed is a system for implementing the method. In an embodiment, the method includes running at least a portion of the kernel as sequences of mini-kernels, each mini-kernel including a number of concurrently executing workgroups. The number of concurrently executing workgroups is defined as a concurrency factor of the mini-kernel. A performance measure is determined for each sequence of mini-kernels. From the sequences, a particular sequence is chosen that achieves a desired performance of the kernel, based on the performance measures. The kernel is executed with the particular sequence.
-
-
-
-
-
-
-
-
-