-
公开(公告)号:US10460513B2
公开(公告)日:2019-10-29
申请号:US15389481
申请日:2016-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Mangesh P. Nijasure , Randy W. Ramsey , Todd Martin
Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.
-
公开(公告)号:US10453243B2
公开(公告)日:2019-10-22
申请号:US16238727
申请日:2019-01-03
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Anirudh R. Acharya , Swapnil Sakharshete , Michael Mantor , Mangesh P. Nijasure , Todd Martin , Vineet Goel
Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.
-
公开(公告)号:US10395424B2
公开(公告)日:2019-08-27
申请号:US15389204
申请日:2016-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Guohua Jin , Todd Martin
Abstract: A method and apparatus of copying data from a first memory location to a second memory location includes performing a copy operation selected out of one or more copy operations. The copy operations include performing interleaved data copying, performing a full wavefront copy operation, copying all data to a local data store (LDS) prior to copying to the second memory location, or pipelining the data for copying. The copy operation is applied to copy the data from the first location to the second memory location.
-
公开(公告)号:US20190172173A1
公开(公告)日:2019-06-06
申请号:US15832131
申请日:2017-12-05
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Usame Ceylan , Young In Yeo , Todd Martin , Vineet Goel
Abstract: A compute unit accesses a chunk of bits that represent indices of vertices of a graphics primitive. The compute unit sets values of a first bit to indicate whether the chunk is monotonic or ordinary, second bits to define an offset that is determined based on values of indices in the chunk, and sets of third bits that determine values of the indices in the chunk based on the offset defined by the second bits. The compute unit writes a compressed chunk represented by the first bit, the second bits, and the sets of third bits to a memory. The compressed chunk is decompressed and the decompressed indices are written to an index buffer. In some embodiments, the indices are decompressed based on metadata that includes offsets that are determined based on values of the indices and bitfields that indicate characteristics of the indices.
-
公开(公告)号:US20180211435A1
公开(公告)日:2018-07-26
申请号:US15417063
申请日:2017-01-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Mangesh P. Nijasure , Todd Martin , Michael Mantor
CPC classification number: G06T15/005 , G06F9/44 , G06T11/40
Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.
-
公开(公告)号:US20180137676A1
公开(公告)日:2018-05-17
申请号:US15354513
申请日:2016-11-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Saad Arrabi , Mangesh P. Nijasure , Todd Martin
CPC classification number: G06T17/20 , G06T1/20 , G06T15/005 , G06T15/80 , G06T2210/52
Abstract: Techniques for removing reset indices from, and identifying primitives in, an index stream that defines a set of primitives to be rendered, are disclosed. The index stream may be specified by an application program executing on the central processing unit. The technique involves classifying the primitive topology for the index stream as either requiring an offset-based technique or requiring a non-offset-based technique. This classification is done by determining whether, according to the primitive topology, each subsequent index can form a primitive with prior indices (e.g., line strip, triangle strip). If each subsequent index can form a primitive with prior indices, then the technique used is the non-offset-based technique. If each subsequent index does not form a primitive with prior indices, but instead at least two indices are required to form a new primitive (e.g., line list, triangle list), then the technique used is the offset-based technique.
-
公开(公告)号:US20180082399A1
公开(公告)日:2018-03-22
申请号:US15415823
申请日:2017-01-25
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Todd Martin , Mangesh P. Nijasure , Randy W. Ramsey , Michael Mantor , Laurent Lefebvre
CPC classification number: G06T1/20 , G06T15/005 , G06T15/40 , G06T2200/04
Abstract: Improvements in the graphics processing pipeline are disclosed. More specifically, a new primitive shader stage performs tasks of the vertex shader stage or a domain shader stage if tessellation is enabled, a geometry shader if enabled, and a fixed function primitive assembler. The primitive shader stage is compiled by a driver from user-provided vertex or domain shader code, geometry shader code, and from code that performs functions of the primitive assembler. Moving tasks of the fixed function primitive assembler to a primitive shader that executes in programmable hardware provides many benefits, such as removal of a fixed function crossbar, removal of dedicated parameter and position buffers that are unusable in general compute mode, and other benefits.
-
公开(公告)号:US12062126B2
公开(公告)日:2024-08-13
申请号:US17489008
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06T15/00
CPC classification number: G06T15/005
Abstract: Systems, apparatuses, and methods for loading multiple primitives per thread in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with a geometry engine, shader processor input (SPI), and a plurality of compute units. The geometry engine generates primitives which are accumulated by the SPI into primitive groups. While accumulating primitives, the SPI tracks the number of vertices and primitives per group. The SPI determines wavefront boundaries based on mapping a single vertex to each thread of the wavefront while allowing more than one primitive per thread. The SPI launches wavefronts with one vertex per thread and potentially multiple primitives per thread. The compute units execute a vertex phase and a multi-cycle primitive phase for wavefronts with multiple primitives per thread.
-
公开(公告)号:US11755336B2
公开(公告)日:2023-09-12
申请号:US17489059
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06T1/60 , G06F9/4401 , G06F9/30 , G06F9/54
CPC classification number: G06F9/4411 , G06F9/3009 , G06F9/544
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
公开(公告)号:US20230095365A1
公开(公告)日:2023-03-30
申请号:US17489059
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06F9/4401 , G06F9/30 , G06F9/54
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
-
-
-
-
-
-
-
-