-
公开(公告)号:US10089709B2
公开(公告)日:2018-10-02
申请号:US15208459
申请日:2016-07-12
Applicant: ARM Limited
Inventor: Andreas Due Engh-Halstvedt , David James Bermingham , Amir Kleen , Jørn Nystad , Kenneth Edvard Østby
Abstract: A graphics processing unit 3 includes a rasterizer 25, a thread spawner 40, a programmable execution unit 41, a varying interpolator 42, a texture mapper 43, and a blender 29. The programmable execution unit 41 is able to communicate with the varying interpolator 42, the texture mapper 43 and the blender 29 to request processing operations by those graphic specific accelerators. In addition to this, these graphics-specific accelerators are also able to communicate directly with each other and with the thread spawner 40, independently of the programmable execution unit 41. This allows for certain graphics processing operations to be performed using direct communication between the graphics-specific accelerators of the graphics processing unit, instead of executing instructions in the programmable execution unit to trigger the performance of those operations by the graphics-specific accelerators.
-
公开(公告)号:US10761885B2
公开(公告)日:2020-09-01
申请号:US16044747
申请日:2018-07-25
Applicant: ARM Limited
Inventor: Isidoros Sideris , Eugenia Cordero-Crespo , Amir Kleen
Abstract: An apparatus and method are provided for executing thread groups. The apparatus comprises scheduling circuitry for selecting for execution a first thread group from a plurality of thread groups, and thread processing circuitry that is responsive to the scheduling circuitry to execute active threads of the first thread group in dependence on a common program counter shared between the active threads. In response to an exit event occurring for the first thread group, the thread processing circuitry determines whether a program counter check condition is present, and this can be used to trigger program counter checking circuitry to perform a program counter check operation to update the common program counter and an active thread indication for the first thread group. The thread processing circuitry is provided with register storage in which program counter information for each thread of the first thread group can be stored, and the program counter checking circuitry is arranged to have access to that register storage when performing the program counter check operation. Further, the scheduling circuitry is arranged to select, for execution by the thread processing circuitry, a different thread group whilst awaiting performance of the program counter check operation by the program counter checking circuitry for the first thread group. This provides an area efficient mechanism for handling divergence and re-convergence of threads within thread groups, in a manner that avoids impacting performance.
-
公开(公告)号:US10332258B2
公开(公告)日:2019-06-25
申请号:US15479280
申请日:2017-04-05
Applicant: ARM Limited
Inventor: Amir Kleen , Peter William Harris , David James Bermingham
Abstract: A graphics processing system sorts graphics primitives for rendering into lists corresponding to different sub-regions of a render output to be generated, each list indicating primitives to be processed for the render output. A primitive list building unit divides a render target into various sub-regions, determines which sub-regions a primitive falls within and adds the primitive to the primitive lists corresponding to those sub-regions. The primitive list building unit also records the positions of the primitives in a pair of histograms which show the distribution of primitives across the render output. Once all primitives for the render output have been sorted into lists, the histograms are outputted to a predictor processor. The predictor processor then determines a set of sub-region sizes to be used when sorting primitives for the next render output to be generated into lists, based on the histograms.
-
公开(公告)号:US20170309027A1
公开(公告)日:2017-10-26
申请号:US15479280
申请日:2017-04-05
Applicant: ARM Limited
Inventor: Amir Kleen , Peter William Harris , David James Bermingham
CPC classification number: G06T7/11 , G06T1/20 , G06T1/60 , G06T11/40 , G06T15/005 , G06T15/40 , G06T17/20 , G06T2207/20021 , G06T2210/12 , G06T2210/36
Abstract: A graphics processing system sorts graphics primitives for rendering into lists corresponding to different sub-regions of a render output to be generated, each list indicating primitives to be processed for the render output. A primitive list building unit divides a render target into various sub-regions, determines which sub-regions a primitive falls within and adds the primitive to the primitive lists corresponding to those sub-regions. The primitive list building unit also records the positions of the primitives in a pair of histograms which show the distribution of primitives across the render output. Once all primitives for the render output have been sorted into lists, the histograms are outputted to a predictor processor. The predictor processor then determines a set of sub-region sizes to be used when sorting primitives for the next render output to be generated into lists, based on the histograms.
-
公开(公告)号:US20170024847A1
公开(公告)日:2017-01-26
申请号:US15208459
申请日:2016-07-12
Applicant: ARM Limited
Inventor: Andreas Due Engh-Halstvedt , David James Bermingham , Amir Kleen , Jørn Nystad , Kenneth Edvard Østby
CPC classification number: G06T1/20 , G06T15/005
Abstract: A graphics processing unit 3 includes a rasteriser 25, a thread spawner 40, a programmable execution unit 41, a varying interpolator 42, a texture mapper 43, and a blender 29. The programmable execution unit 41 is able to communicate with the varying interpolator 42, the texture mapper 43 and the blender 29 to request processing operations by those graphic specific accelerators. In addition to this, these graphics-specific accelerators are also able to communicate directly with each other and with the thread spawner 40, independently of the programmable execution unit 41. This allows for certain graphics processing operations to be performed using direct communication between the graphics-specific accelerators of the graphics processing unit, instead of executing instructions in the programmable execution unit to trigger the performance of those operations by the graphics-specific accelerators.
Abstract translation: 图形处理单元3包括光栅化器25,线程器40,可编程执行单元41,变化内插器42,纹理映射器43和混合器29.可编程执行单元41能够与变化内插器42 ,纹理映射器43和混合器29,以请求那些图形特定加速器的处理操作。 除此之外,这些特定于图形的加速器还能够独立于可编程执行单元41彼此直接地与线程线程器40进行通信。这允许使用图形之间的直接通信执行某些图形处理操作 而不是执行可编程执行单元中的指令以触发图形特定加速器对这些操作的执行。
-
-
-
-