Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Michael J. Mantor"

11.

发明申请
STREAM PROCESSOR WITH OVERLAPPING EXECUTION 审中-公开

公开(公告)号：US20190004807A1

公开(公告)日：2019-01-03

申请号：US15657478

申请日：2017-07-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Qingcheng Wang , Yunxiao Zou , Bin He , Jian Yang , Michael J. Mantor , Brian D. Emberling

IPC: G06F9/38

Abstract: Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

12.

发明授权
Efficient arbitration for memory accesses 有权

公开(公告)号：US10152434B2

公开(公告)日：2018-12-11

申请号：US15385566

申请日：2016-12-20

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Rostyslav Kyrychynskyi , Anthony Asaro , Kostantinos Danny Christidis , Mark Fowler , Michael J. Mantor , Robert Scott Hartog

IPC: G06F13/14 , G06F13/16 , G06F13/36 , G06F13/40

Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.

13.

发明申请
SUSPEND AND RESTORE PROCESSOR OPERATIONS 审中-公开

公开(公告)号：US20180239635A1

公开(公告)日：2018-08-23

申请号：US15438466

申请日：2017-02-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Alexander Fuad Ashkar , Michael J. Mantor , Randy Wayne Ramsey , Rex Eldon McCrary , Harry J. Wise

IPC: G06F9/48

Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.

14.

发明申请
SOFTWARE CONTROL OF STATE SETS 审中-公开

公开(公告)号：US20180210657A1

公开(公告)日：2018-07-26

申请号：US15417011

申请日：2017-01-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Rex Eldon McCrary , Michael J. Mantor , Alexander Fuad Ashkar , Harry J. Wise

IPC: G06F3/06 , G06F9/48

CPC classification number: G06F3/0607 , G06F3/0619 , G06F3/0634 , G06F3/065 , G06F3/067 , G06F9/4881 , G06F9/50

Abstract: Systems, apparatuses, and methods for implementing software control of state sets are disclosed. In one embodiment, a processor includes at least an execution unit and a plurality of state registers. The processor is configured to detect a command to allocate a first state set for storing a first state, wherein the command is generated by software, and wherein the first state specifies values for the plurality of state registers. The command is executed on the execution unit while the processor is in a second state, wherein the second state is different from the first state. The first state set of the processor is allocated with the first state responsive to executing the command on the execution unit. The processor is configured to allocate the first state set for the first state prior to the processor entering the first state.

15.

发明公开
Synchronization Method for Low Latency Communication for Efficient Scheduling 审中-公开

公开(公告)号：US20240111575A1

公开(公告)日：2024-04-04

申请号：US17936798

申请日：2022-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Matthäus G. Chajdas , Michael J. Mantor , Rex Eldon McCrary , Christopher J. Brennan , Robert Martin , Dominik Baumeister , Fabian Robert Sebastian Wildgrube

IPC: G06F9/48 , G06F9/54

CPC classification number: G06F9/4881 , G06F9/546

Abstract: Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.

16.

发明公开
Work Graph Scheduler Implementation 审中-公开

公开(公告)号：US20240111574A1

公开(公告)日：2024-04-04

申请号：US17936788

申请日：2022-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Matthäus G. Chajdas , Michael J. Mantor , Rex Eldon McCrary , Christopher J. Brennan , Robert Martin , Dominik Baumeister , Fabian Robert Sebastian Wildgrube

IPC: G06F9/48 , G06F11/30

CPC classification number: G06F9/4881 , G06F11/3024 , G06F11/3055

Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache. Subsequent to retrieving the work items, the local scheduler is configured to schedule the retrieved work items for execution by the coupled processors. Each of the plurality of local schedulers is configured to schedule work items for execution independent of scheduling performed by other local schedulers.

17.

发明申请
GRAPHICS PRIMITIVES AND POSITIONS THROUGH MEMORY BUFFERS 有权

公开(公告)号：US20230097097A1

公开(公告)日：2023-03-30

申请号：US17489105

申请日：2021-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey , Michael J. Mantor , Christopher J. Brennan , Mark M. Leather , Ryan James Cash

IPC: G06T15/80 , G06T1/20 , G06T1/60 , G06T15/00

Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.

18.

发明申请
REDUNDANCY METHOD AND APPARATUS FOR SHADER COLUMN REPAIR 有权

公开(公告)号：US20220343456A1

公开(公告)日：2022-10-27

申请号：US17862096

申请日：2022-07-11

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras

IPC: G06T1/20 , G06T1/60 , G09G5/36

Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.

19.

发明授权
Low power and low latency GPU coprocessor for persistent computing 有权

公开(公告)号：US10929944B2

公开(公告)日：2021-02-23

申请号：US15360057

申请日：2016-11-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor

IPC: G06T1/20 , G06F9/54 , G06F9/38 , G06T1/60

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

20.

发明申请
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING 审中-公开

公开(公告)号：US20180144435A1

公开(公告)日：2018-05-24

申请号：US15360057

申请日：2016-11-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Timour Paltashev , Alexander Lyashevsky , Carl Kittredge Wakeland , Michael J. Mantor

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06F9/3887 , G06F9/542 , G06F2009/3883 , G06F2209/548 , G06T1/60

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification