INSTRUCTION PREFETCH BASED ON THREAD DISPATCH COMMANDS

    公开(公告)号:US20220083339A1

    公开(公告)日:2022-03-17

    申请号:US17509726

    申请日:2021-10-25

    Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.

    POWER SAVINGS FOR NEURAL NETWORK ARCHITECTURE WITH ZERO ACTIVATIONS DURING INFERENCE

    公开(公告)号:US20190041961A1

    公开(公告)日:2019-02-07

    申请号:US16144538

    申请日:2018-09-27

    Abstract: Embodiments are generally directed to providing power savings for a neural network architecture with zero activations during inference. An embodiment of an apparatus includes one or more processors including one or more processor cores; and a memory to store data for processing including neural network processing, wherein the apparatus to perform a fast clear operation to initialize activation buffers for a neural network by updating metadata to indicate zero values, the neural network including a plurality of layers, wherein the apparatus is to compare outputs for the neural network to the metadata values and to write an output to memory only if the output is non-zero.

    INSTRUCTION PREFETCH BASED ON THREAD DISPATCH COMMANDS

    公开(公告)号:US20250077232A1

    公开(公告)日:2025-03-06

    申请号:US18882364

    申请日:2024-09-11

    Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.

Patent Agency Ranking