APPARATUS AND METHOD FOR SCHEDULING INFERENCE TASKS

    公开(公告)号:US20230297415A1

    公开(公告)日:2023-09-21

    申请号:US17699058

    申请日:2022-03-18

    CPC classification number: G06F9/4843 G06T1/20 G06F9/30189

    Abstract: Apparatus and method for scheduling inference tasks. For example, one embodiment of an apparatus comprises: a plurality of compute units (CUs) to execute inferencing routines, an inferencing routine comprising a plurality of phases, at least one CU comprising execution circuitry configurable to operate in a single instruction multiple data (SIMD) mode or a single instruction multiple thread (SIMT) mode; and dispatching hardware logic to determine whether a current phase of an inferencing routine is to be executed in the SIMD mode or the SIMT mode, and to dispatch instructions of the current phase for execution by the execution circuitry of a CU in accordance with the SIMD mode or the SIMT mode, respectively.

    CLUSTER OF SCALAR ENGINES TO ACCELERATE INTERSECTION IN LEAF NODE

    公开(公告)号:US20200211252A1

    公开(公告)日:2020-07-02

    申请号:US16235893

    申请日:2018-12-28

    Abstract: Cluster of acceleration engines to accelerate intersections. For example, one embodiment of an apparatus comprises: a set of graphics cores to execute a first set of instructions of a primary graphics thread; a scalar cluster comprising a plurality of scalar execution engines; and a communication fabric interconnecting the set of graphics cores and the scalar cluster; the set of graphics cores to offload execution of a second set of instructions associated with ray traversal and/or intersection operations to the scalar cluster; the scalar cluster comprising a plurality of local memories, each local memory associated with one of the scalar execution engines, wherein each local memory is to store a portion of a hierarchical acceleration data structure required by an associated scalar execution engine to execute one or more of the second set of instructions; the plurality of scalar execution engines to store results of the execution of the second set of instructions in a memory accessible by the set of graphics cores; wherein the set of graphics cores are to process the results within the primary graphics thread.

    POWER SAVINGS FOR NEURAL NETWORK ARCHITECTURE WITH ZERO ACTIVATIONS DURING INFERENCE

    公开(公告)号:US20190041961A1

    公开(公告)日:2019-02-07

    申请号:US16144538

    申请日:2018-09-27

    Abstract: Embodiments are generally directed to providing power savings for a neural network architecture with zero activations during inference. An embodiment of an apparatus includes one or more processors including one or more processor cores; and a memory to store data for processing including neural network processing, wherein the apparatus to perform a fast clear operation to initialize activation buffers for a neural network by updating metadata to indicate zero values, the neural network including a plurality of layers, wherein the apparatus is to compare outputs for the neural network to the metadata values and to write an output to memory only if the output is non-zero.

Patent Agency Ranking