Compression Techniques and Hierarchical Caching

    公开(公告)号:US20210295593A1

    公开(公告)日:2021-09-23

    申请号:US17338846

    申请日:2021-06-04

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, a memory system implements a storage hierarchy that includes first cache circuitry and second cache circuitry at different levels of the hierarchy. Processor circuitry generates write data to be written to the memory system. In some embodiments, first compression circuitry is configured to compress a first block of write data in response to full accumulation of the first block in the first cache circuitry and second compression circuitry is configured to compress a second block of write data in response to full accumulation of the second block in the second cache circuitry. Write circuitry may write the first and second compressed blocks of data in a single combined write to a higher level in the storage hierarchy.

    Page Management and Forward Progress for Ray Tracing

    公开(公告)号:US20250095273A1

    公开(公告)日:2025-03-20

    申请号:US18509902

    申请日:2023-11-15

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to memory page allocation for graphics processor. In some embodiments, a shader program includes a primary thread associated with ray tracing (that includes an instruction that indicates for the apparatus to launch one or more secondary threads). Memory resource allocator circuitry may receive a request to allocate a memory page in a page pool to a thread of the shader program, where the page pool includes a set of protected pages and a set of public pages. The allocator may allocate a page of the page pool to the requesting thread according to an allocation restriction, such that protected pages are allocable only to secondary threads that are launched based on a primary thread and public pages are allocable to both primary and secondary threads.

    Compute Kernel Parsing with Limits in one or more Dimensions

    公开(公告)号:US20220083377A1

    公开(公告)日:2022-03-17

    申请号:US17018913

    申请日:2020-09-11

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

    Multi-space rendering with configurable transformation parameters

    公开(公告)号:US10755383B2

    公开(公告)日:2020-08-25

    申请号:US16130265

    申请日:2018-09-13

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to rendering graphics objects. In some embodiments, a graphics unit is configured to transform graphics objects from a virtual space into a second space according to different transformation parameters for different portions of the second space. This may result in sampling different portions of the virtual space at different sample rates, which may reduce the number of samples required in various stages of the rendering process. In the disclosed techniques, transformation may occur prior to rasterization and shading, which may further reduce computation and power consumption in a graphics unit, improve image quality as displayed to a user, and/or reduce bandwidth usage or latency of video content on a network. In some embodiments, a transformed image may be viewed through a distortion-compensating lens or resampled prior to display.

    On-demand memory allocation
    5.
    发明授权

    公开(公告)号:US11829298B2

    公开(公告)日:2023-11-28

    申请号:US16804128

    申请日:2020-02-28

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

    Compression techniques and hierarchical caching

    公开(公告)号:US11488350B2

    公开(公告)日:2022-11-01

    申请号:US17338846

    申请日:2021-06-04

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to compression of data stored at different cache levels. In some embodiments, a memory system implements a storage hierarchy that includes first cache circuitry and second cache circuitry at different levels of the hierarchy. Processor circuitry generates write data to be written to the memory system. In some embodiments, first compression circuitry is configured to compress a first block of write data in response to full accumulation of the first block in the first cache circuitry and second compression circuitry is configured to compress a second block of write data in response to full accumulation of the second block in the second cache circuitry. Write circuitry may write the first and second compressed blocks of data in a single combined write to a higher level in the storage hierarchy.

    Multi-space rendering with configurable transformation parameters

    公开(公告)号:US11113788B2

    公开(公告)日:2021-09-07

    申请号:US17001007

    申请日:2020-08-24

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to rendering graphics objects. In some embodiments, a graphics unit is configured to transform graphics objects from a virtual space into a second space according to different transformation parameters for different portions of the second space. This may result in sampling different portions of the virtual space at different sample rates, which may reduce the number of samples required in various stages of the rendering process. In the disclosed techniques, transformation may occur prior to rasterization and shading, which may further reduce computation and power consumption in a graphics unit, improve image quality as displayed to a user, and/or reduce bandwidth usage or latency of video content on a network. In some embodiments, a transformed image may be viewed through a distortion-compensating lens or resampled prior to display.

    Dependency scheduling for control stream in parallel processor

    公开(公告)号:US11080101B2

    公开(公告)日:2021-08-03

    申请号:US16361910

    申请日:2019-03-22

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to processing a control stream such as a compute control stream. In some embodiments, the control stream includes kernels and commands for multiple substreams. In some embodiments, multiple substream processors are each configured to: fetch and parse portions of the control stream corresponding to an assigned substream and, in response to a neighbor barrier command in the assigned substream that identifies another substream, communicate the identified other substream to a barrier clearing circuitry. In some embodiments, the barrier clearing circuitry is configured to determine whether to allow the assigned substream to proceed past the neighbor barrier command based on communication of a most-recently-completed command from a substream processor to which the other substream is assigned (e.g., based on whether the most-recently-completed command meets a command identifier communicated in the neighbor barrier command). The disclosed techniques may facilitate parallel control stream parsing and substream synchronization.

    Multi-Space Rendering with Configurable Transformation Parameters

    公开(公告)号:US20190102865A1

    公开(公告)日:2019-04-04

    申请号:US16130265

    申请日:2018-09-13

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to rendering graphics objects. In some embodiments, a graphics unit is configured to transform graphics objects from a virtual space into a second space according to different transformation parameters for different portions of the second space. This may result in sampling different portions of the virtual space at different sample rates, which may reduce the number of samples required in various stages of the rendering process. In the disclosed techniques, transformation may occur prior to rasterization and shading, which may further reduce computation and power consumption in a graphics unit, improve image quality as displayed to a user, and/or reduce bandwidth usage or latency of video content on a network. In some embodiments, a transformed image may be viewed through a distortion-compensating lens or resampled prior to display.

Patent Agency Ranking