-
公开(公告)号:US20220320042A1
公开(公告)日:2022-10-06
申请号:US17217165
申请日:2021-03-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Michael MANTOR
IPC: H01L25/065 , H01L23/538 , G06T1/20 , G06F13/40
Abstract: A multi-die parallel processor semiconductor package includes a first base IC die including a first plurality of virtual compute dies 3D stacked on top of the first base IC die. A first subset of a parallel processing pipeline logic is positioned at the first plurality of virtual compute dies. Additionally, a second subset of the parallel processing pipeline logic is positioned at the first base IC die. The multi-die parallel processor semiconductor package also includes a second base IC die including a second plurality of virtual compute dies 3D stacked on top of the second base IC die. An active bridge chip communicably couples a first interconnect structure of the first base IC die to a first interconnect structure of the second base IC die.
-
公开(公告)号:US20210241516A1
公开(公告)日:2021-08-05
申请号:US17091957
申请日:2020-11-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mark LEATHER , Michael MANTOR
Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.
-
公开(公告)号:US20210064366A1
公开(公告)日:2021-03-04
申请号:US16556611
申请日:2019-08-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Randy RAMSEY , William David ISENBERG , Michael MANTOR
Abstract: An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.
-
公开(公告)号:US20210049729A1
公开(公告)日:2021-02-18
申请号:US16879991
申请日:2020-05-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Timour T. PALTASHEV , Michael MANTOR , Rex Eldon MCCRARY
Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
-
公开(公告)号:US20190163527A1
公开(公告)日:2019-05-30
申请号:US15828059
申请日:2017-11-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Anirudh R. ACHARYA , Michael MANTOR
Abstract: A first workload is executed in a first subset of pipelines of a processing unit. A second workload is executed in a second subset of the pipelines of the processing unit. The second workload is dependent upon the first workload. The first and second workloads are suspended and state information for the first and second workloads is stored in a first memory in response to suspending the first and second workloads. In some cases, a third workload executes in a third subset of the pipelines of the processing unit concurrently with executing the first and second workloads. In some cases, a fourth workload is executed in the first and second pipelines after suspending the first and second workloads. The first and second pipelines are resumed on the basis of the stored state information in response to completion or suspension of the fourth workload.
-
公开(公告)号:US20220100813A1
公开(公告)日:2022-03-31
申请号:US17032314
申请日:2020-09-25
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. RUSH , Michael MANTOR , Arun Vaidyanathan ANANTHANARAYAN , Prasad NAGABHUSHANAMGARI
Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.
-
公开(公告)号:US20210090205A1
公开(公告)日:2021-03-25
申请号:US16580654
申请日:2019-09-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Michael MANTOR , Alexander Fuad ASHKAR , Randy RAMSEY , Mangesh P. NIJASURE , Brian EMBERLING
Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.
-
公开(公告)号:US20210089304A1
公开(公告)日:2021-03-25
申请号:US16581252
申请日:2019-09-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Michael MANTOR , Jiasheng CHEN , Jian HUANG
Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
-
公开(公告)号:US20200293329A1
公开(公告)日:2020-09-17
申请号:US16860842
申请日:2020-04-28
Inventor: Jiasheng CHEN , YunXiao ZOU , Bin HE , Angel E. SOCARRAS , QingCheng WANG , Wei YUAN , Michael MANTOR
Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
-
公开(公告)号:US20200293286A1
公开(公告)日:2020-09-17
申请号:US16591031
申请日:2019-10-02
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Michael MANTOR , Jiasheng CHEN
Abstract: A graphics processing unit (GPU) implements operations, with associated op codes, to perform mixed precision mathematical operations. The GPU includes an arithmetic logic unit (ALU) with different execution paths, wherein each execution path executes a different mixed precision operation. By implementing mixed precision operations at the ALU in response to designate op codes that delineate the operations, the GPU efficiently increases the precision of specified mathematical operations while reducing execution overhead.
-
-
-
-
-
-
-
-
-