-
21.
公开(公告)号:US20150332427A1
公开(公告)日:2015-11-19
申请号:US14808113
申请日:2015-07-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
CPC classification number: G06T1/20 , G06T1/60 , G09G5/363 , G09G2360/06
Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.
Abstract translation: 描述了方法,系统和非暂时性计算机可读介质。 系统包括着色器管道阵列,冗余着色器管道阵列,定序器和冗余着色器开关。 着色器管道阵列包括多个着色器管道,每个管道对其提供的数据执行渲染计算。 冗余着色器管道阵列还对提供给它的数据执行渲染计算。 定序器在着色器管道阵列中识别至少一个有缺陷的着色器管道,并作为响应生成信号。 冗余着色器开关接收所生成的信号,并作为响应,将指定为每个着色器管道的数据独立地传输到冗余着色器管道阵列。
-
公开(公告)号:US20240329998A1
公开(公告)日:2024-10-03
申请号:US18619392
申请日:2024-03-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Bin He , Michael J. Mantor , Brian D. Emberling
CPC classification number: G06F9/3802 , G06F9/3001 , G06F9/30098 , G06F9/3867
Abstract: An apparatus and method for efficiently processing multiplication and accumulate operations for matrices in applications. In various implementations, a computing system includes a parallel data processing circuit and a memory. The memory stores the instructions (or translated commands) of a parallel data application. The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file and multiple instantiations of a vector processing circuit capable of performing multiple matrix multiplication operations corresponding to multiple different types of instructions. The multiplier circuit and the adder circuit of the vector processing circuit perform each of the fused multiply add (FMA) operation and the dot product (inner product) operation without independent, dedicated execution pipelines with one execution pipeline for the FMA operation and the other separate execution pipeline for the dot product operation.
-
公开(公告)号:US20240202003A1
公开(公告)日:2024-06-20
申请号:US18066115
申请日:2022-12-14
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthäus G. Chajdas , Michael J. Mantor , Rex Eldon McCrary , Christopher J. Brennan , Robert Martin , Brian Kenneth Bennett
CPC classification number: G06F9/3867 , G06F9/4881
Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed. In various implementations, a processor includes a pipeline comprising a plurality of fixed-function units and a scheduler. The scheduler is configured to schedule a first operation for execution by one or more fixed-function units of the pipeline by scheduling the first operation with a first unit of the pipeline, responsive to a first mode of operation and schedule a second operation for execution by a selected fixed-function unit of the pipeline by scheduling the second operation directly to the selected fixed-function unit, independent of a sequential arrangement of the one or more fixed-function units in the pipeline, responsive to a second mode of operation.
-
公开(公告)号:US11625807B2
公开(公告)日:2023-04-11
申请号:US17181300
申请日:2021-02-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor
Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
-
公开(公告)号:US10861122B2
公开(公告)日:2020-12-08
申请号:US15156658
申请日:2016-05-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.
-
公开(公告)号:US20200042348A1
公开(公告)日:2020-02-06
申请号:US16050948
申请日:2018-07-31
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler
Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
-
公开(公告)号:US20190171448A1
公开(公告)日:2019-06-06
申请号:US15855637
申请日:2017-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Yunxiao Zou , Michael J. Mantor , Allen Rush
Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.
-
公开(公告)号:US20190129718A1
公开(公告)日:2019-05-02
申请号:US15799560
申请日:2017-10-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Bin He , Yunxiao Zou , Michael J. Mantor , Radhakrishna Giduthuri , Eric J. Finger , Brian D. Emberling
Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.
-
公开(公告)号:US20180173649A1
公开(公告)日:2018-06-21
申请号:US15385566
申请日:2016-12-20
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Rostyslav Kyrychynskyi , Anthony Asaro , Kostantinos Danny Christidis , Mark Fowler , Michael J. Mantor , Robert Scott Hartog
CPC classification number: G06F13/161 , G06F13/1673 , G06F13/4068
Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.
-
公开(公告)号:US20180143907A1
公开(公告)日:2018-05-24
申请号:US15360205
申请日:2016-11-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Daniel Clifton , Michael J. Mantor , Hans Burton
IPC: G06F12/0846 , G06F9/38
CPC classification number: G06F12/0848 , G06F9/3887 , G06F9/5077 , G06F9/526 , G06F2212/282
Abstract: A system and method for efficiently processing access requests for a shared resource are described. Each of many requestors are assigned to a partition of a shared resource. When a controller determines no requestor generates an access request for an unassigned partition, the controller permits simultaneous access to the assigned partitions for active requestors. When the controller determines at least one active requestor generates an access request for an unassigned partition, the controller allows a single active requestor to gain exclusive access to the entire shared resource while stalling access for the other active requestors. The controller alternatives exclusive access among the active requestors. In various embodiments, the shared resource is a local data store in a graphics processing unit and each of the multiple requestors is a single instruction multiple data (SIMD) compute unit.
-
-
-
-
-
-
-
-
-