-
31.
公开(公告)号:US20170212757A1
公开(公告)日:2017-07-27
申请号:US15483745
申请日:2017-04-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Brian Emberling
CPC classification number: G06F9/3009 , G06F9/30087 , G06F9/30098 , G06F9/30123 , G06F9/3834 , G06F9/3851 , G06F9/3885 , G06F9/3887 , G06F15/16 , G06F15/8007 , G06T1/20
Abstract: A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads. w
-
公开(公告)号:US20160260192A1
公开(公告)日:2016-09-08
申请号:US15156658
申请日:2016-05-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
CPC classification number: G06T1/20 , G06T1/60 , G09G5/363 , G09G2360/06
Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.
-
公开(公告)号:US12067401B2
公开(公告)日:2024-08-20
申请号:US15855637
申请日:2017-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Yunxiao Zou , Michael J. Mantor , Allen Rush
CPC classification number: G06F9/3867 , G06F7/5443 , G06F9/3001 , G06F9/30036 , G06F9/30101 , G06F17/16
Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.
-
公开(公告)号:US11948223B2
公开(公告)日:2024-04-02
申请号:US17862096
申请日:2022-07-11
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
CPC classification number: G06T1/20 , G06T1/60 , G09G5/363 , G09G2360/06
Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.
-
公开(公告)号:US11880683B2
公开(公告)日:2024-01-23
申请号:US15799560
申请日:2017-10-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Bin He , Yunxiao Zou , Michael J. Mantor , Radhakrishna Giduthuri , Eric J. Finger , Brian D. Emberling
CPC classification number: G06F9/30014 , G06F7/483 , G06F7/57 , G06F9/30036 , G06F9/30112 , G06F2207/3812 , G06F2207/3828
Abstract: Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.
-
公开(公告)号:US11467870B2
公开(公告)日:2022-10-11
申请号:US16938381
申请日:2020-07-24
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler
Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
-
公开(公告)号:US11386520B2
公开(公告)日:2022-07-12
申请号:US17113827
申请日:2020-12-07
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.
-
公开(公告)号:US20210201439A1
公开(公告)日:2021-07-01
申请号:US17181300
申请日:2021-02-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor
Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
-
公开(公告)号:US10725822B2
公开(公告)日:2020-07-28
申请号:US16050948
申请日:2018-07-31
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler
Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
-
公开(公告)号:US10558489B2
公开(公告)日:2020-02-11
申请号:US15438466
申请日:2017-02-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander Fuad Ashkar , Michael J. Mantor , Randy Wayne Ramsey , Rex Eldon McCrary , Harry J. Wise
Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.
-
-
-
-
-
-
-
-
-