Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Michael J. Mantor"

1.

发明授权
Graphics primitives and positions through memory buffers 有权

公开(公告)号：US12169896B2

公开(公告)日：2024-12-17

申请号：US17489105

申请日：2021-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey , Michael J. Mantor , Christopher J. Brennan , Mark M. Leather , Ryan James Cash

IPC: G06T15/80 , G06T1/20 , G06T1/60 , G06T15/00

Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.

2.

发明授权
VMID as a GPU task container for virtualization 有权

公开(公告)号：US12153958B2

公开(公告)日：2024-11-26

申请号：US18045128

申请日：2022-10-07

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler

IPC: G06F9/46 , G06F9/48 , G06F9/52 , G06F9/54 , G06F12/1009 , G06F12/14 , G06T1/20

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

3.

发明授权
Stream processor with decoupled crossbar for cross lane operations 有权

公开(公告)号：US10970081B2

公开(公告)日：2021-04-06

申请号：US15637629

申请日：2017-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mohammad Reza Hakami , Timothy Lottes , Justin David Smith , Michael J. Mantor , Derek Carson

IPC: G06F9/30 , G06F9/38 , G06F9/52

Abstract: Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands.

4.

发明申请
REDUNDANCY METHOD AND APPARATUS FOR SHADER COLUMN REPAIR 有权

公开(公告)号：US20210090208A1

公开(公告)日：2021-03-25

申请号：US17113827

申请日：2020-12-07

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael J. Mantor , Jeffrey T. Brady , Angel E. Socarras

IPC: G06T1/20 , G06T1/60 , G09G5/36

Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.

5.

发明授权
Processor support for bypassing vector source operands 有权

公开(公告)号：US10817302B2

公开(公告)日：2020-10-27

申请号：US15644045

申请日：2017-07-07

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mark M. Leather , Michael J. Mantor , Yunxiao Zou

IPC: G06F9/38 , G06F9/30 , G06F12/0891 , G06F12/0855 , G06F12/0804 , G06F12/121 , G06F12/0875

Abstract: Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM's output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.

6.

发明授权
Software control of state sets 有权

公开(公告)号：US10180789B2

公开(公告)日：2019-01-15

申请号：US15417011

申请日：2017-01-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Rex Eldon McCrary , Michael J. Mantor , Alexander Fuad Ashkar , Harry J. Wise

IPC: G06F12/00 , G06F3/06 , G06F9/48 , G06F9/50

Abstract: Systems, apparatuses, and methods for implementing software control of state sets are disclosed. In one embodiment, a processor includes at least an execution unit and a plurality of state registers. The processor is configured to detect a command to allocate a first state set for storing a first state, wherein the command is generated by software, and wherein the first state specifies values for the plurality of state registers. The command is executed on the execution unit while the processor is in a second state, wherein the second state is different from the first state. The first state set of the processor is allocated with the first state responsive to executing the command on the execution unit. The processor is configured to allocate the first state set for the first state prior to the processor entering the first state.

7.

发明申请
STREAM PROCESSOR WITH DECOUPLED CROSSBAR FOR CROSS LANE OPERATIONS 审中-公开

公开(公告)号：US20190004814A1

公开(公告)日：2019-01-03

申请号：US15637629

申请日：2017-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mohammad Reza Hakami , Timothy Lottes , Justin David Smith , Michael J. Mantor , Derek Carson

IPC: G06F9/38 , G06F9/52 , G06F9/30

Abstract: Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands.

8.

发明申请
VMID AS A GPU TASK CONTAINER FOR VIRTUALIZATION 有权

公开(公告)号：US20230055695A1

公开(公告)日：2023-02-23

申请号：US18045128

申请日：2022-10-07

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler

IPC: G06F9/48 , G06F9/52 , G06F9/54 , G06F12/1009 , G06F12/14 , G06T1/20

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

9.

发明申请
VMID AS A GPU TASK CONTAINER FOR VIRTUALIZATION 有权

公开(公告)号：US20210011760A1

公开(公告)日：2021-01-14

申请号：US16938381

申请日：2020-07-24

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler

IPC: G06F9/48 , G06F9/52 , G06F9/54 , G06F12/1009 , G06F12/14 , G06T1/20

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

10.

发明授权
Indicating instruction scheduling mode for processing wavefront portions 有权

公开(公告)号：US10474468B2

公开(公告)日：2019-11-12

申请号：US15439540

申请日：2017-02-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael J. Mantor , Brian D. Emberling , Mark Fowler , Mark M. Leather

IPC: G06F9/38 , G06T1/20 , G06F9/30

Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification