-
公开(公告)号:US12169896B2
公开(公告)日:2024-12-17
申请号:US17489105
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey , Michael J. Mantor , Christopher J. Brennan , Mark M. Leather , Ryan James Cash
Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.
-
公开(公告)号:US12062126B2
公开(公告)日:2024-08-13
申请号:US17489008
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06T15/00
CPC classification number: G06T15/005
Abstract: Systems, apparatuses, and methods for loading multiple primitives per thread in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with a geometry engine, shader processor input (SPI), and a plurality of compute units. The geometry engine generates primitives which are accumulated by the SPI into primitive groups. While accumulating primitives, the SPI tracks the number of vertices and primitives per group. The SPI determines wavefront boundaries based on mapping a single vertex to each thread of the wavefront while allowing more than one primitive per thread. The SPI launches wavefronts with one vertex per thread and potentially multiple primitives per thread. The compute units execute a vertex phase and a multi-cycle primitive phase for wavefronts with multiple primitives per thread.
-
公开(公告)号:US11755336B2
公开(公告)日:2023-09-12
申请号:US17489059
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06T1/60 , G06F9/4401 , G06F9/30 , G06F9/54
CPC classification number: G06F9/4411 , G06F9/3009 , G06F9/544
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
公开(公告)号:US12236529B2
公开(公告)日:2025-02-25
申请号:US17562653
申请日:2021-12-27
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Christopher J. Brennan , Randy Wayne Ramsey , Nishank Pathak , Ricky Wai Yeung Iu , Jimshed Mirza , Anthony Chan
Abstract: Systems, apparatuses, and methods for implementing a discard engine in a graphics pipeline are disclosed. A system includes a graphics pipeline with a geometry engine launching shaders that generate attribute data for vertices of each primitive of a set of primitives. The attribute data is consumed by pixel shaders, with each pixel shader generating a deallocation message when the pixel shader no longer needs the attribute data. A discard engine gathers deallocations from multiple pixel shaders and determines when the attribute data is no longer needed. Once a block of attributes has been consumed by all potential pixel shader consumers, the discard engine deallocates the given block of attributes. The discard engine sends a discard command to the caches so that the attribute data can be invalidated and not written back to memory.
-
公开(公告)号:US20230095365A1
公开(公告)日:2023-03-30
申请号:US17489059
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06F9/4401 , G06F9/30 , G06F9/54
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
公开(公告)号:US20180239635A1
公开(公告)日:2018-08-23
申请号:US15438466
申请日:2017-02-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander Fuad Ashkar , Michael J. Mantor , Randy Wayne Ramsey , Rex Eldon McCrary , Harry J. Wise
IPC: G06F9/48
Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.
-
公开(公告)号:US10558489B2
公开(公告)日:2020-02-11
申请号:US15438466
申请日:2017-02-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander Fuad Ashkar , Michael J. Mantor , Randy Wayne Ramsey , Rex Eldon McCrary , Harry J. Wise
Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.
-
公开(公告)号:US10311626B2
公开(公告)日:2019-06-04
申请号:US15297611
申请日:2016-10-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Rashad Oreifej , Angel E. Socarras , Mark Russell Anderson , Randy Wayne Ramsey
Abstract: A GPU filters graphics workloads to identify candidates for profiling. In response to receiving a graphics workload for the first time, the GPU determines if the graphics workload would require the GPU shaders to use fewer resources than would be spent profiling and determining a resource allocation for subsequent receipts of the same or a similar graphics workload. The GPU can further determine if the shaders are processing more than one graphics workload at the same time, such that the performance characteristics of each individual graphics workload cannot be effectively isolated. The GPU then profiles and stores resource allocations for a plurality of shaders for processing the filtered graphics workloads, and applies those stored resource allocations when the same or a similar graphics workload is received subsequently by the GPU.
-
9.
公开(公告)号:US20180108166A1
公开(公告)日:2018-04-19
申请号:US15297611
申请日:2016-10-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Rashad Oreifej , Angel E. Socarras , Mark Russell Anderson , Randy Wayne Ramsey
CPC classification number: G06T15/005 , G06F9/38
Abstract: A GPU filters graphics workloads to identify candidates for profiling. In response to receiving a graphics workload for the first time, the GPU determines if the graphics workload would require the GPU shaders to use fewer resources than would be spent profiling and determining a resource allocation for subsequent receipts of the same or a similar graphics workload. The GPU can further determine if the shaders are processing more than one graphics workload at the same time, such that the performance characteristics of each individual graphics workload cannot be effectively isolated. The GPU then profiles and stores resource allocations for a plurality of shaders for processing the filtered graphics workloads, and applies those stored resource allocations when the same or a similar graphics workload is received subsequently by the GPU.
-
公开(公告)号:US20230376318A1
公开(公告)日:2023-11-23
申请号:US18363333
申请日:2023-08-01
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06F9/4401 , G06F9/30 , G06F9/54
CPC classification number: G06F9/4411 , G06F9/3009 , G06F9/544
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
-
-
-
-
-
-
-
-