-
公开(公告)号:US10943389B2
公开(公告)日:2021-03-09
申请号:US15374752
申请日:2016-12-09
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Laurent Lefebvre , Michael Mantor , Mark Fowler , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi , Christopher J. Brennan
Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.
-
公开(公告)号:US10656951B2
公开(公告)日:2020-05-19
申请号:US15789318
申请日:2017-10-20
Inventor: Jiasheng Chen , YunXiao Zou , Bin He , Angel E. Socarras , QingCheng Wang , Wei Yuan , Michael Mantor
Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
-
公开(公告)号:US10453243B2
公开(公告)日:2019-10-22
申请号:US16238727
申请日:2019-01-03
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Anirudh R. Acharya , Swapnil Sakharshete , Michael Mantor , Mangesh P. Nijasure , Todd Martin , Vineet Goel
Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.
-
公开(公告)号:US20180321946A1
公开(公告)日:2018-11-08
申请号:US16040224
申请日:2018-07-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Robert Scott Hartog , Mark Leather , Michael Mantor , Rex McCrary , Sebastien Nussbaum , Philip J. Rogers , Ralph Clay Taylor , Thomas Woller
CPC classification number: G06F9/3851 , G06F9/3879 , G06F9/4881 , G06T1/20
Abstract: A method for use in a processor for arbitrating between multiple processes to select wavefronts for execution on a shader core is provided. The processor includes a compute pipeline configured to issue wavefronts to the shader core for execution, a hardware queue descriptor associated with the compute pipeline, and the shader core. The shader core is configured to execute work for the compute pipeline corresponding to a first memory queue descriptor executed using data for the first memory queue descriptor that is loaded into a first hardware queue descriptor. The processor is configured to detect a context switch condition, and, responsive to the context switch condition, perform a context switch operation including loading data for a second memory queue descriptor into the first hardware queue descriptor. The shader core is configured to execute work corresponding to the second memory queue descriptor that is loaded into the first hardware queue descriptor.
-
公开(公告)号:US20180211435A1
公开(公告)日:2018-07-26
申请号:US15417063
申请日:2017-01-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Mangesh P. Nijasure , Todd Martin , Michael Mantor
CPC classification number: G06T15/005 , G06F9/44 , G06T11/40
Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.
-
公开(公告)号:US20180211434A1
公开(公告)日:2018-07-26
申请号:US15415813
申请日:2017-01-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Mangesh P. Nijasure , Michael Mantor , Jeffrey M. Smith
CPC classification number: G06T15/005 , G06T15/10 , G06T15/20 , G06T15/30 , G06T15/80 , H04N13/275
Abstract: Techniques for generating a stereo image from a single set of input geometry in a three-dimensional rendering pipeline are disclosed. Vertices are processed through the end of the world-space pipeline. In the primitive assembler, at the end of the world-space pipeline, before perspective division, each clip-space vertex is duplicated. The primitive assembler generates this duplicated clip-space vertex using the y, z, and w coordinates of the original vertex and based on an x coordinate that is offset in the x-direction in clip-space as compared with the x coordinate of the original vertex. Both the original vertex clip-space vertex and the modified clip-space vertex are then sent through the rest of the pipeline for processing, including perspective division, viewport transform, rasterization, pixel shading, and other operations. The result is that a single set of input vertices is rendered into a stereo image.
-
87.
公开(公告)号:US20180113709A1
公开(公告)日:2018-04-26
申请号:US15342809
申请日:2016-11-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Bin He , YunXiao Zou , Jiasheng Chen , Michael Mantor
CPC classification number: G06F9/3887 , G06F9/30014 , G06F9/30036 , G06F9/3893
Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.
-
公开(公告)号:US20180082399A1
公开(公告)日:2018-03-22
申请号:US15415823
申请日:2017-01-25
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Todd Martin , Mangesh P. Nijasure , Randy W. Ramsey , Michael Mantor , Laurent Lefebvre
CPC classification number: G06T1/20 , G06T15/005 , G06T15/40 , G06T2200/04
Abstract: Improvements in the graphics processing pipeline are disclosed. More specifically, a new primitive shader stage performs tasks of the vertex shader stage or a domain shader stage if tessellation is enabled, a geometry shader if enabled, and a fixed function primitive assembler. The primitive shader stage is compiled by a driver from user-provided vertex or domain shader code, geometry shader code, and from code that performs functions of the primitive assembler. Moving tasks of the fixed function primitive assembler to a primitive shader that executes in programmable hardware provides many benefits, such as removal of a fixed function crossbar, removal of dedicated parameter and position buffers that are unusable in general compute mode, and other benefits.
-
公开(公告)号:US20170371654A1
公开(公告)日:2017-12-28
申请号:US15191339
申请日:2016-06-23
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Ljubisa Bajic , Michael Mantor , Syed Zohaib M. Gilani , Rajabali M. Koduri
IPC: G06F9/30 , G06F12/0891
CPC classification number: G06F9/3012 , G06F9/30123 , G06F9/384 , G06F9/3851 , G06F9/3887
Abstract: Described is a system and method for using virtual vector register files. In particular, a graphics processor includes a logic unit, a virtual vector register file coupled to the logic unit, a vector register backing store coupled to the virtual vector register file, and a virtual vector register file controller coupled to the virtual vector register file. The virtual vector register file includes a N deep vector register file and a M deep vector register file, where N is less than M. The virtual vector register file controller performing eviction and allocation between the N deep vector register file, the M deep vector register file and the vector register backing store dependent on at least access requests for certain vector registers.
-
公开(公告)号:US20170371393A1
公开(公告)日:2017-12-28
申请号:US15189054
申请日:2016-06-22
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Syed Zohaib M. Gilani , Jiasheng Chen , QingCheng Wang , YunXiao Zou , Michael Mantor , Bin He , Timour T. Paltashev
CPC classification number: G06F15/8007 , G06F1/3234 , G06F1/3243 , G06T15/005 , Y02D10/152
Abstract: Described is a method and processing apparatus to improve power efficiency by gating redundant threads processing. In particular, the method for gating redundant threads in a graphics processor includes determining if data for a thread and data for at least another thread are within a predetermined similarity threshold, gating execution of the at least another thread if the data for the thread and the data for the at least another thread are within the predetermined similarity threshold, and using an output data from the thread as an output data for the at least another thread.
-
-
-
-
-
-
-
-
-