-
公开(公告)号:US20210271606A1
公开(公告)日:2021-09-02
申请号:US16804128
申请日:2020-02-28
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Yoong Chert Foo , Terence M. Potter , Frank W. Liljeros , Ralph C. Taylor
IPC: G06F12/1018 , G06F12/084
Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
-
公开(公告)号:US20210134052A1
公开(公告)日:2021-05-06
申请号:US16673883
申请日:2019-11-04
Applicant: Apple Inc.
Inventor: Anthony P. DeLaurier , Karl D. Mann , Tyson J. Bergland , Winnie W. Yeung
Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, programmable shader circuitry is configured to execute program instructions of compute kernels that write pixel data. In some embodiments, a first cache is configured to store pixel write data from the programmable shader circuitry and first compression circuitry is configured to compress a first block of pixel write data in response to full accumulation of the first block in the first cache circuitry. In some embodiments, second cache circuitry is configured to store pixel write data from the programmable shader circuitry at a higher level in a storage hierarchy than the first cache circuitry and second compression circuitry is configured to compress a second block of pixel write data in response to full accumulation of the second block in the second cache circuitry. In some embodiments, write circuitry is configured to write the first and second compressed blocks of pixel data in a combined write to a higher level in the storage hierarchy.
-
公开(公告)号:US20200301753A1
公开(公告)日:2020-09-24
申请号:US16361910
申请日:2019-03-22
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Jason D. Carroll , Karl D. Mann
IPC: G06F9/52
Abstract: Techniques are disclosed relating to processing a control stream such as a compute control stream. In some embodiments, the control stream includes kernels and commands for multiple substreams. In some embodiments, multiple substream processors are each configured to: fetch and parse portions of the control stream corresponding to an assigned substream and, in response to a neighbor barrier command in the assigned substream that identifies another substream, communicate the identified other substream to a barrier clearing circuitry. In some embodiments, the barrier clearing circuitry is configured to determine whether to allow the assigned substream to proceed past the neighbor barrier command based on communication of a most-recently-completed command from a substream processor to which the other substream is assigned (e.g., based on whether the most-recently-completed command meets a command identifier communicated in the neighbor barrier command). The disclosed techniques may facilitate parallel control stream parsing and substream synchronization.
-
公开(公告)号:US10074210B1
公开(公告)日:2018-09-11
申请号:US15659188
申请日:2017-07-25
Applicant: Apple Inc.
Inventor: Christopher L. Spencer , Karl D. Mann , Ralph C. Taylor , Dinesh D. Kuwar
CPC classification number: G06T15/40 , G06T11/40 , G06T15/005 , G06T15/08 , G06T15/80 , G06T17/20 , G09G5/00 , G09G5/363
Abstract: Techniques are disclosed relating to rendering graphics objects that require shader operations to determine visibility. In some embodiments, a graphics unit is configured to process feedback objects, which may require shading to determine whether they are visible relative to previously-processed objects, out of draw order. For example, in embodiments where a buffer is used to store fragment data for deferred rendering, the graphics unit may bypass the buffer and shade feedback objects ahead of earlier non-feedback objects whose fragment data is stored in the buffer. This may allow a determination of whether to remove occluded non-feedback fragment data from the buffer, which may reduce graphics overdraw. In disclosed two-pass techniques, data for feedback objects is first allowed to bypass the buffer for visibility shading, but is then stored in the buffer for a second pass to perform fragment shading to actually determine pixel attributes, which may further reduce overdraw.
-
公开(公告)号:US20240345892A1
公开(公告)日:2024-10-17
申请号:US18673959
申请日:2024-05-24
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann
Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.
-
公开(公告)号:US12020075B2
公开(公告)日:2024-06-25
申请号:US17018913
申请日:2020-09-11
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Ajay Simha Modugala , Karl D. Mann
Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.
-
公开(公告)号:US20240045808A1
公开(公告)日:2024-02-08
申请号:US18490588
申请日:2023-10-19
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Yoong Chert Foo , Terence M. Potter , Frank W. Liljeros , Ralph C. Taylor
IPC: G06F12/1018 , G06F12/084
CPC classification number: G06F12/1018 , G06F12/084 , G06F30/392
Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
-
公开(公告)号:US11062507B2
公开(公告)日:2021-07-13
申请号:US16673883
申请日:2019-11-04
Applicant: Apple Inc.
Inventor: Anthony P. DeLaurier , Karl D. Mann , Tyson J. Bergland , Winnie W. Yeung
Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, programmable shader circuitry is configured to execute program instructions of compute kernels that write pixel data. In some embodiments, a first cache is configured to store pixel write data from the programmable shader circuitry and first compression circuitry is configured to compress a first block of pixel write data in response to full accumulation of the first block in the first cache circuitry. In some embodiments, second cache circuitry is configured to store pixel write data from the programmable shader circuitry at a higher level in a storage hierarchy than the first cache circuitry and second compression circuitry is configured to compress a second block of pixel write data in response to full accumulation of the second block in the second cache circuitry. In some embodiments, write circuitry is configured to write the first and second compressed blocks of pixel data in a combined write to a higher level in the storage hierarchy.
-
-
-
-
-
-
-