FUSED INSTRUCTION TO ACCELERATE PERFORMANCE OF SECURE HASH ALGORITHM 2 (SHA-2) WORKLOADS IN A GRAPHICS ENVIRONMENT

    公开(公告)号:US20220416999A1

    公开(公告)日:2022-12-29

    申请号:US17358897

    申请日:2021-06-25

    申请人: Intel Corporation

    IPC分类号: H04L9/06 G06F9/38 G06T15/00

    摘要: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.

    SINGLE PRECISION SUPPORT FOR SYSTOLIC PIPELINE IN A GRAPHICS ENVIRONMENT

    公开(公告)号:US20240111825A1

    公开(公告)日:2024-04-04

    申请号:US17937229

    申请日:2022-09-30

    申请人: Intel Corporation

    IPC分类号: G06F17/16 G06F7/483

    CPC分类号: G06F17/16 G06F7/483

    摘要: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.

    EMULATION OF FLOATING POINT CALCULATION

    公开(公告)号:US20230086275A1

    公开(公告)日:2023-03-23

    申请号:US17482166

    申请日:2021-09-22

    申请人: Intel Corporation

    摘要: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.

    64-BIT TWO-DIMENSIONAL BLOCK LOAD WITH TRANSPOSE

    公开(公告)号:US20220413854A1

    公开(公告)日:2022-12-29

    申请号:US17358859

    申请日:2021-06-25

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38 G06T15/00

    摘要: An apparatus to facilitate 64-bit two-dimensional (2D) block load with transpose is disclosed. The apparatus includes a processor comprising processing resources; and load store pipeline hardware circuitry coupled to the processing resources, the load store pipeline hardware circuitry to receive a 64-bit two-dimensional (2D) block load message with transpose from the processing resources. The load store pipeline hardware circuitry comprising a load store pipeline sequencer to map rows of a block of memory corresponding to the 64-bit 2D block load message with transpose to 64-bit standard load messages; and load store pipeline return circuitry to: sequentially number general register files (GRFs) used for returning elements of the block of memory accessed by the 64-bit standard load messages to the processing resources; and return, to the processing resources, the sequentially numbered GRFs in response to the 64-bit 2D block load message with transpose.