-
公开(公告)号:US20190004807A1
公开(公告)日:2019-01-03
申请号:US15657478
申请日:2017-07-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Qingcheng Wang , Yunxiao Zou , Bin He , Jian Yang , Michael J. Mantor , Brian D. Emberling
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.
-
公开(公告)号:US10152434B2
公开(公告)日:2018-12-11
申请号:US15385566
申请日:2016-12-20
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Rostyslav Kyrychynskyi , Anthony Asaro , Kostantinos Danny Christidis , Mark Fowler , Michael J. Mantor , Robert Scott Hartog
Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.
-
公开(公告)号:US20180239635A1
公开(公告)日:2018-08-23
申请号:US15438466
申请日:2017-02-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander Fuad Ashkar , Michael J. Mantor , Randy Wayne Ramsey , Rex Eldon McCrary , Harry J. Wise
IPC: G06F9/48
Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.
-
公开(公告)号:US20180210657A1
公开(公告)日:2018-07-26
申请号:US15417011
申请日:2017-01-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Rex Eldon McCrary , Michael J. Mantor , Alexander Fuad Ashkar , Harry J. Wise
CPC classification number: G06F3/0607 , G06F3/0619 , G06F3/0634 , G06F3/065 , G06F3/067 , G06F9/4881 , G06F9/50
Abstract: Systems, apparatuses, and methods for implementing software control of state sets are disclosed. In one embodiment, a processor includes at least an execution unit and a plurality of state registers. The processor is configured to detect a command to allocate a first state set for storing a first state, wherein the command is generated by software, and wherein the first state specifies values for the plurality of state registers. The command is executed on the execution unit while the processor is in a second state, wherein the second state is different from the first state. The first state set of the processor is allocated with the first state responsive to executing the command on the execution unit. The processor is configured to allocate the first state set for the first state prior to the processor entering the first state.
-
公开(公告)号:US20240111575A1
公开(公告)日:2024-04-04
申请号:US17936798
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthäus G. Chajdas , Michael J. Mantor , Rex Eldon McCrary , Christopher J. Brennan , Robert Martin , Dominik Baumeister , Fabian Robert Sebastian Wildgrube
CPC classification number: G06F9/4881 , G06F9/546
Abstract: Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.
-
公开(公告)号:US20240111574A1
公开(公告)日:2024-04-04
申请号:US17936788
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthäus G. Chajdas , Michael J. Mantor , Rex Eldon McCrary , Christopher J. Brennan , Robert Martin , Dominik Baumeister , Fabian Robert Sebastian Wildgrube
CPC classification number: G06F9/4881 , G06F11/3024 , G06F11/3055
Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache. Subsequent to retrieving the work items, the local scheduler is configured to schedule the retrieved work items for execution by the coupled processors. Each of the plurality of local schedulers is configured to schedule work items for execution independent of scheduling performed by other local schedulers.
-
公开(公告)号:US20230097097A1
公开(公告)日:2023-03-30
申请号:US17489105
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey , Michael J. Mantor , Christopher J. Brennan , Mark M. Leather , Ryan James Cash
Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.
-
公开(公告)号:US20220343456A1
公开(公告)日:2022-10-27
申请号:US17862096
申请日:2022-07-11
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras
Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.
-
公开(公告)号:US10929944B2
公开(公告)日:2021-02-23
申请号:US15360057
申请日:2016-11-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor
Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
-
公开(公告)号:US20180144435A1
公开(公告)日:2018-05-24
申请号:US15360057
申请日:2016-11-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor
CPC classification number: G06T1/20 , G06F9/3887 , G06F9/542 , G06F2009/3883 , G06F2209/548 , G06T1/60
Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
-
-
-
-
-
-
-
-
-