DISTRIBUTED REGISTER FILE CACHE TO REDUCE L1 BANDWIDTH REQUIREMENTS

    公开(公告)号:US20250068473A1

    公开(公告)日:2025-02-27

    申请号:US18453867

    申请日:2023-08-22

    Abstract: Described herein is a graphics processor comprising a graphics processing cluster coupled with the memory interface, the graphics processing cluster including a plurality of processing resources, a processing resource of the plurality of processing resources including a register file including a first plurality of registers associated with a first hardware thread of a plurality of hardware threads of the processing resource and a second plurality of registers associated with a second hardware thread of the plurality of hardware threads of the processing resource and first circuitry configured to facilitate access to memory on behalf of the plurality of hardware threads and store metadata for memory access requests from the plurality of hardware threads.

    ENHANCEMENTS FOR ACCUMULATOR USAGE AND INSTRUCTION FORWARDING IN MATRIX MULTIPLY PIPELINE IN GRAPHICS ENVIRONMENT

    公开(公告)号:US20240169021A1

    公开(公告)日:2024-05-23

    申请号:US18056930

    申请日:2022-11-18

    CPC classification number: G06F17/16 G06F7/5443

    Abstract: An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.

    SUPPORTING AND LOAD BALANCING MULTIPLE DOUBLE PRECISION PIPELINES IN A GRAPHICS ENVIRONMENT

    公开(公告)号:US20240168764A1

    公开(公告)日:2024-05-23

    申请号:US18056820

    申请日:2022-11-18

    CPC classification number: G06F9/30014 G06F9/3867

    Abstract: An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.

    MATRIX TRANSPOSITION IN MATRIX MULTIPLICATION ARRAY CIRCUITRY

    公开(公告)号:US20240168723A1

    公开(公告)日:2024-05-23

    申请号:US18056822

    申请日:2022-11-18

    CPC classification number: G06F7/78 G06F17/16

    Abstract: An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.

Patent Agency Ranking