CACHE STREAMING APPARATUS AND METHOD FOR DEEP LEARNING OPERATIONS

    公开(公告)号:US20230297513A1

    公开(公告)日:2023-09-21

    申请号:US17699062

    申请日:2022-03-18

    CPC classification number: G06F12/0897 G06N20/00 G06F2212/60

    Abstract: A cache streaming apparatus and method for machine learning. For example, one embodiment of an apparatus comprises: a plurality of compute units to perform machine learning operations; a cache subsystem comprising a hierarchy of cache levels, at least some of the cache levels shared by two or more of the plurality of compute units; and data streaming hardware logic to stream machine learning data in and out of the cache subsystem based on the machine learning operations, the data streaming hardware logic to load data into the cache subsystem from memory before the data is needed by a first portion of the machine learning operations and to ensure that results produced by the first portion of machine learning operations are maintained in the cache subsystem until used by a second portion of the machine learning operations.

    CLOUD-BASED REALTIME RAYTRACING
    3.
    发明申请

    公开(公告)号:US20200211265A1

    公开(公告)日:2020-07-02

    申请号:US16236218

    申请日:2018-12-28

    Abstract: Cloud-based real time rendering. For example, one embodiment of a system comprises: a first graphics processing node to perform a first set of graphics processing operations to render a graphics scene, the first set of graphics processing operations comprising ray-tracing independent operations; an interconnect or network interface coupling the first graphics processing node to a second graphics processing node; the second graphics processing node to receive an indication of a current view of a user of the first graphics processing node and to receive or construct a view-independent surface generated by view-independent ray traversal and intersection operations; the second graphics processing node to responsively perform a view-dependent translation of the view-independent surface based on the current view of the user to generate a view-dependent surface and to provide the view-dependent surface to the first graphics processing node; and the first graphics processing node to perform a second set of graphics processing operations to complete rendering of the graphics scene using the view-dependent surface.

    APPARATUS AND METHOD FOR RAY TRACING INSTRUCTION PROCESSING AND EXECUTION

    公开(公告)号:US20230137438A1

    公开(公告)日:2023-05-04

    申请号:US18090810

    申请日:2022-12-29

    Abstract: An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.

    APPARATUS AND METHOD FOR ACCELERATION DATA STRUCTURE REFIT

    公开(公告)号:US20210012553A1

    公开(公告)日:2021-01-14

    申请号:US17032964

    申请日:2020-09-25

    Abstract: Apparatus and method for acceleration data structure refit. For example, one embodiment of an apparatus comprises: a ray generator to generate a plurality of rays in a first graphics scene; a hierarchical acceleration data structure generator to construct an acceleration data structure comprising a plurality of hierarchically arranged nodes including inner nodes and leaf nodes stored in a memory in a depth-first search (DFS) order; traversal hardware logic to traverse one or more of the rays through the acceleration data structure; intersection hardware logic to determine intersections between the one or more rays and one or more primitives within the hierarchical acceleration data structure; a node refit unit comprising circuitry and/or logic to read consecutively through at least the inner nodes in the memory in reverse DFS order to perform a bottom-up refit operation on the hierarchical acceleration data structure.

    APPARATUS AND METHOD FOR ACCELERATION DATA STRUCTURE REFIT

    公开(公告)号:US20230162428A1

    公开(公告)日:2023-05-25

    申请号:US17982766

    申请日:2022-11-08

    CPC classification number: G06T15/06 G06F16/9027 G06F7/14 G06F9/3877 G06N3/02

    Abstract: Apparatus and method for acceleration data structure refit. For example, one embodiment of an apparatus comprises: a ray generator to generate a plurality of rays in a first graphics scene; a hierarchical acceleration data structure generator to construct an acceleration data structure comprising a plurality of hierarchically arranged nodes including inner nodes and leaf nodes stored in a memory in a depth-first search (DFS) order; traversal hardware logic to traverse one or more of the rays through the acceleration data structure; intersection hardware logic to determine intersections between the one or more rays and one or more primitives within the hierarchical acceleration data structure; a node refit unit comprising circuitry and/or logic to read consecutively through at least the inner nodes in the memory in reverse DFS order to perform a bottom-up refit operation on the hierarchical acceleration data structure.

    APPARATUS AND METHOD FOR RAY TRACING INSTRUCTION PROCESSING AND EXECUTION

    公开(公告)号:US20210035349A1

    公开(公告)日:2021-02-04

    申请号:US16996208

    申请日:2020-08-18

    Abstract: An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.

    CONTEXT-AWARE COMPRESSION WITH QUANTIZATION OF HIERARCHICAL TRANSFORM MATRICES

    公开(公告)号:US20220343554A1

    公开(公告)日:2022-10-27

    申请号:US17740754

    申请日:2022-05-10

    Abstract: Apparatus and method for context-aware compression. For example, one embodiment of an apparatus comprises: ray traversal/intersection circuitry to traverse rays through a hierarchical acceleration data structure to identify intersections between rays and primitives of a graphics scene; matrix compression circuitry/logic to compress hierarchical transformation matrices to generate compressed hierarchical transformation matrices by quantizing N-bit floating point data elements associated with child transforms of the hierarchical transformation matrices to variable-bit floating point numbers or integers comprising offsets from a parent transform of the child transform; and an instance processor to generate a plurality of instances of one or more base geometric objects in accordance with the compressed hierarchical transformation matrices.

    CONTEXT-AWARE COMPRESSION WITH QUANTIZATION OF HIERARCHICAL TRANSFORM MATRICES

    公开(公告)号:US20210082154A1

    公开(公告)日:2021-03-18

    申请号:US17003040

    申请日:2020-08-26

    Abstract: Apparatus and method for context-aware compression. For example, one embodiment of an apparatus comprises: ray traversal/intersection circuitry to traverse rays through a hierarchical acceleration data structure to identify intersections between rays and primitives of a graphics scene; matrix compression circuitry/logic to compress hierarchical transformation matrices to generate compressed hierarchical transformation matrices by quantizing N-bit floating point data elements associated with child transforms of the hierarchical transformation matrices to variable-bit floating point numbers or integers comprising offsets from a parent transform of the child transform; and an instance processor to generate a plurality of instances of one or more base geometric objects in accordance with the compressed hierarchical transformation matrices.

Patent Agency Ranking