Apparatuses, methods, and systems for hashing instructions

    公开(公告)号:US11188335B2

    公开(公告)日:2021-11-30

    申请号:US17087536

    申请日:2020-11-02

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

    Low-latency link compression schemes

    公开(公告)号:US10924591B2

    公开(公告)日:2021-02-16

    申请号:US16014690

    申请日:2018-06-21

    Abstract: Methods and apparatus for low-latency link compression schemes. Under the schemes, selected packets or messages are dynamically selected for compression in view of current transmit queue levels. The latency incurred during compression and decompression is not added to the data-path, but sits on the side of the transmit queue. The system monitors the queue depth and, accordingly, initiates compression jobs based on the depth. Different compression levels may be dynamically selected and used based on queue depth. Under various schemes, either packets or messages are enqueued in the transmit queue or pointers to such packets and messages are enqueued. Additionally, packets/message may be compressed prior to being enqueued, or after being enqueued, wherein an original uncompressed packet is replaced with a compressed packet. Compressed and uncompressed packets may be stored in queues or buffers and transmitted using a different numbers of transmit cycles based on their compression ratios. The schemes may be implemented to improve the effective bandwidth of various types of links, including serial links, bus-type links, and socket-to-socket links in multi-socket systems.

    SMS4 acceleration hardware
    23.
    发明授权
    SMS4 acceleration hardware 有权
    SMS4加速硬件

    公开(公告)号:US09503256B2

    公开(公告)日:2016-11-22

    申请号:US14582707

    申请日:2014-12-24

    CPC classification number: H04L9/0822 G09C1/00 H04L9/0631 H04L2209/122

    Abstract: Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.

    Abstract translation: 公开了用于SMS4加速硬件的发明的实施例。 在一个实施例中,一种装置包括SMS4硬件和密钥变换硬件。 SMS4硬件是执行一轮加密和一轮密钥扩展。 密钥转换硬件是转换密钥以提供SMS4硬件来执行一轮解密。

    ACCELERATING KECCAK ALGORITHMS
    26.
    发明公开

    公开(公告)号:US20240211253A1

    公开(公告)日:2024-06-27

    申请号:US18145744

    申请日:2022-12-22

    CPC classification number: G06F9/30029 G06F9/3016 G06F9/3802

    Abstract: A method comprises fetching, by fetch circuitry, an encoded parity instruction comprising at least one opcode, a first source identifier for a first source, a second source identifier for a second source, a third source identifier for a third source, and a destination identifier for a destination, decoding, by decode circuitry, the encoded parity instruction to generate a decoded parity instruction; and executing, by execution circuitry, the decoded parity instruction to retrieve operands representing a first register from the first source, a second register from the second source, a third register from the third source, and an index from the third source, perform an XOR operation of four words of data from the first register and single word of data from the second register in a position represented by the index to generate a parity value, and store the parity value in a the first register in a position represented by the index.

    Apparatuses, methods, and systems for hashing instructions

    公开(公告)号:US11681530B2

    公开(公告)日:2023-06-20

    申请号:US17688728

    申请日:2022-03-07

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

    Apparatuses, methods, and systems for hashing instructions

    公开(公告)号:US11567772B2

    公开(公告)日:2023-01-31

    申请号:US17537373

    申请日:2021-11-29

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

    FUSED INSTRUCTION TO ACCELERATE PERFORMANCE OF SECURE HASH ALGORITHM 2 (SHA-2) WORKLOADS IN A GRAPHICS ENVIRONMENT

    公开(公告)号:US20220416999A1

    公开(公告)日:2022-12-29

    申请号:US17358897

    申请日:2021-06-25

    Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.

    Delayed link compression scheme
    30.
    发明授权

    公开(公告)号:US11494320B2

    公开(公告)日:2022-11-08

    申请号:US16140472

    申请日:2018-09-24

    Abstract: Apparatus, systems and methods for implementing delayed decompression schemes. As a burst of packets comprising compressed packets and uncompressed packets are received over an interconnect link, they are buffered in a receive buffer without decompression. Subsequently, the packets are forwarded from the receive buffer to a consumer such as processor core, with the compressed packets being decompressed prior to reaching the processor core. Under a first delayed decompression approach, packets are decompressed when they are read from the receive buffer in conjunction with forwarding the uncompressed packet (or uncompressed data contained therein) to the consumer. Under a second delayed decompression scheme, the packets are read from the receive buffer and forwarded to a decompressor using a first datapath width matching the width of the packets, decompressed, and then forwarded to the consumer using a second datapath width matching the width of the uncompressed data.

Patent Agency Ranking