-
公开(公告)号:US20180357064A1
公开(公告)日:2018-12-13
申请号:US15644045
申请日:2017-07-07
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Bin He , Mark M. Leather , Michael J. Mantor , Yunxiao Zou
IPC: G06F9/38 , G06F9/30 , G06F12/0875 , G06F12/0891
CPC classification number: G06F9/3867 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/3012 , G06F9/30141 , G06F9/3802 , G06F9/3826 , G06F9/383 , G06F9/3832 , G06F9/3857 , G06F12/0804 , G06F12/0855 , G06F12/0875 , G06F12/0891 , G06F12/121 , G06F2212/1008 , G06F2212/1024 , G06F2212/452
Abstract: Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM's output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.
-
公开(公告)号:US10140123B2
公开(公告)日:2018-11-27
申请号:US15483745
申请日:2017-04-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Brian Emberling
Abstract: A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.
-
公开(公告)号:US10073783B2
公开(公告)日:2018-09-11
申请号:US15360205
申请日:2016-11-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Daniel Clifton , Michael J. Mantor , Hans Burton
IPC: G06F12/08 , G06F12/0846 , G06F9/38
CPC classification number: G06F12/0848 , G06F9/3887 , G06F9/5077 , G06F9/526 , G06F2212/282
Abstract: A system and method for efficiently processing access requests for a shared resource are described. Each of many requestors are assigned to a partition of a shared resource. When a controller determines no requestor generates an access request for an unassigned partition, the controller permits simultaneous access to the assigned partitions for active requestors. When the controller determines at least one active requestor generates an access request for an unassigned partition, the controller allows a single active requestor to gain exclusive access to the entire shared resource while stalling access for the other active requestors. The controller alternatives exclusive access among the active requestors. In various embodiments, the shared resource is a local data store in a graphics processing unit and each of the multiple requestors is a single instruction multiple data (SIMD) compute unit.
-
公开(公告)号:US20180239606A1
公开(公告)日:2018-08-23
申请号:US15439540
申请日:2017-02-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Brian D. Emberling , Mark Fowler , Mark M. Leather
IPC: G06F9/38 , G06F9/30 , G06F12/0875
Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.
-
45.
公开(公告)号:US09367891B2
公开(公告)日:2016-06-14
申请号:US14808113
申请日:2015-07-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
CPC classification number: G06T1/20 , G06T1/60 , G09G5/363 , G09G2360/06
Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.
Abstract translation: 描述了方法,系统和非暂时性计算机可读介质。 系统包括着色器管道阵列,冗余着色器管道阵列,定序器和冗余着色器开关。 着色器管道阵列包括多个着色器管道,每个管道对其提供的数据执行渲染计算。 冗余着色器管道阵列还对提供给它的数据执行渲染计算。 定序器在着色器管道阵列中识别至少一个有缺陷的着色器管道,并作为响应生成信号。 冗余着色器开关接收所生成的信号,并作为响应,将指定为每个着色器管道的数据独立地传输到冗余着色器管道阵列。
-
-
-
-