Graphics memory space for shader core

    公开(公告)号:US11521343B2

    公开(公告)日:2022-12-06

    申请号:US17103462

    申请日:2020-11-24

    Applicant: Apple Inc.

    Abstract: Disclosed techniques relate to memory space management for graphics processing. In some embodiments, first and second graphics cores are configured to execute instructions for multiple threadgroups. In some embodiments, the threads groups include a first threadgroup with multiple single-instruction multiple-data (SIMD) groups configured to execute a first shader program and a second threadgroup with multiple SIMD groups configured to execute a second, different shader program. Control circuitry may be configured to provide access to data stored in memory circuitry according to a shader memory space. The shader memory space may be accessible to threadgroups executed by the first graphics shader core, including the first and second threadgroups, but is not accessible to threadgroups executed by the second graphics shader core. Disclosed techniques may reduce latency, increase bandwidth available to the shader, reduce coherency cost, or any combination thereof.

    Ray Intersect Circuitry with Parallel Ray Testing

    公开(公告)号:US20220036639A1

    公开(公告)日:2022-02-03

    申请号:US17103433

    申请日:2020-11-24

    Applicant: Apple Inc.

    Abstract: Disclosed techniques relate to ray intersection processing for ray tracing. In some embodiments, ray intersection circuitry traverses a spatially organized acceleration data structure and includes bounding region circuitry configured to test, in parallel, whether a ray intersects multiple different bounding regions indicated by a node of the data structure. Shader circuitry may execute a ray intersect instruction to invoke traversal by the ray intersect circuitry and the traversal may generate intersection results. The shader circuitry may shade intersected primitives based on the intersection results. Disclosed techniques that share processing between intersection circuitry and shader processors may improve performance, reduce power consumption, or both, relative to traditional techniques.

    On-demand Memory Allocation
    7.
    发明公开

    公开(公告)号:US20240045808A1

    公开(公告)日:2024-02-08

    申请号:US18490588

    申请日:2023-10-19

    Applicant: Apple Inc.

    CPC classification number: G06F12/1018 G06F12/084 G06F30/392

    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

    Lossy Compression Techniques
    8.
    发明公开

    公开(公告)号:US20230253979A1

    公开(公告)日:2023-08-10

    申请号:US18302513

    申请日:2023-04-18

    Applicant: Apple Inc.

    CPC classification number: H03M7/3059 H04N19/182 H04N19/176

    Abstract: Techniques are disclosed relating to compression of pixel data using different quantization for different regions of a block of pixels being compressed. In some embodiments, compression circuitry is configured to determine, for multiple components included in pixels of the block of pixels being compressed, respective smallest and greatest component values in respective regions of the block of pixels. The compression circuitry may determine, based on the determined smallest and greatest component values, to use a first number of bits to represent delta values relative to a base value for a first component in a first region and a second, different number of bits to represent delta values relative to a base value for a second component in the first region. The compression circuitry may then quantize delta values for the first and second components of pixels in the first region of the block of pixels using the determined first and second numbers of bits. In some embodiments, the compression circuitry determines whether to provide cross-component bit sharing within a region.

    Temporal split techniques for motion blur and ray intersection

    公开(公告)号:US11676327B2

    公开(公告)日:2023-06-13

    申请号:US17205680

    申请日:2021-03-18

    Applicant: Apple Inc.

    CPC classification number: G06T15/06 G06F16/9027 G06T15/005

    Abstract: Techniques are disclosed relating to ray intersection in the context of motion blur. In some embodiments, a graphics processor includes time-oblivious ray intersect circuitry configured to receive coordinates for a ray and traverse a bounding volume hierarchy (BVH) data structure based on the coordinates to determine whether the ray intersects with one or more bounding regions of a graphics space. In some embodiments, in response to reaching a temporal branch element of the BVH data structure, the ray intersect circuitry initiates a shader program that determines a sub-tree of the BVH data structure for further traversal by the ray intersection circuitry, where the sub-tree corresponds to a portion of a motion-blur interval in which the ray falls. This may provide accurate ray tracing for motion blur while reducing area and power consumption of intersect circuitry, relative to time-aware implementations.

    Lossy compression techniques
    10.
    发明授权

    公开(公告)号:US11664816B2

    公开(公告)日:2023-05-30

    申请号:US16855540

    申请日:2020-04-22

    Applicant: Apple Inc.

    CPC classification number: H03M7/3059 H04N19/176 H04N19/182

    Abstract: Techniques are disclosed relating to compression of pixel data using different quantization for different regions of a block of pixels being compressed. In some embodiments, compression circuitry is configured to determine, for multiple components included in pixels of the block of pixels being compressed, respective smallest and greatest component values in respective regions of the block of pixels. The compression circuitry may determine, based on the determined smallest and greatest component values, to use a first number of bits to represent delta values relative to a base value for a first component in a first region and a second, different number of bits to represent delta values relative to a base value for a second component in the first region. The compression circuitry may then quantize delta values for the first and second components of pixels in the first region of the block of pixels using the determined first and second numbers of bits. In some embodiments, the compression circuitry determines whether to provide cross-component bit sharing within a region.

Patent Agency Ranking