Methods and apparatus to parallelize data decompression

    公开(公告)号:US11595055B2

    公开(公告)日:2023-02-28

    申请号:US17586598

    申请日:2022-01-27

    Abstract: Methods and apparatus to parallelize data decompression are disclosed. An example method selecting initial starting positions in a compressed data bitstream; adjusting a first one of the initial starting positions to determine a first adjusted starting position by decoding the bitstream starting at a training position in the bitstream, the decoding including traversing the bitstream from the training position as though first data located at the training position is a valid token; outputting first decoded data generated by decoding a first segment of the bitstream starting from the first adjusted starting position; and merging the first decoded data with second decoded data generated by decoding a second segment of the bitstream, the decoding of the second segment starting from a second position in the bitstream and being performed in parallel with the decoding of the first segment, and the second segment preceding the first segment in the bitstream.

    Apparatuses, methods, and systems for hashing instructions

    公开(公告)号:US11567772B2

    公开(公告)日:2023-01-31

    申请号:US17537373

    申请日:2021-11-29

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

    Architecture and instruction set for implementing advanced encryption standard (AES)

    公开(公告)号:US11563556B2

    公开(公告)日:2023-01-24

    申请号:US16841241

    申请日:2020-04-06

    Abstract: A processor of an aspect is to perform a Single Instruction Multiple Data (SIMD) instruction. The SIMD instruction is to indicate a source register storing input data to be processed by a round of AES and is to indicate a source of a round key to be used for the round of AES. The processor is to perform the SIMD instruction to perform the round of AES on the input data using the round key and store a result of the round of AES in a destination. In one aspect, the SIMD instruction is to provide a parameter to specify whether or not a round of AES to be performed is a last round. Other instructions, processors, methods, and systems are described.

    Delayed link compression scheme
    305.
    发明授权

    公开(公告)号:US11494320B2

    公开(公告)日:2022-11-08

    申请号:US16140472

    申请日:2018-09-24

    Abstract: Apparatus, systems and methods for implementing delayed decompression schemes. As a burst of packets comprising compressed packets and uncompressed packets are received over an interconnect link, they are buffered in a receive buffer without decompression. Subsequently, the packets are forwarded from the receive buffer to a consumer such as processor core, with the compressed packets being decompressed prior to reaching the processor core. Under a first delayed decompression approach, packets are decompressed when they are read from the receive buffer in conjunction with forwarding the uncompressed packet (or uncompressed data contained therein) to the consumer. Under a second delayed decompression scheme, the packets are read from the receive buffer and forwarded to a decompressor using a first datapath width matching the width of the packets, decompressed, and then forwarded to the consumer using a second datapath width matching the width of the uncompressed data.

    Ultra-secure accelerators
    306.
    发明授权

    公开(公告)号:US11455257B2

    公开(公告)日:2022-09-27

    申请号:US16377230

    申请日:2019-04-07

    Abstract: Methods and apparatus for ultra-secure accelerators. New ISA enqueue (ENQ) instructions with a wrapping key (WK) are provided to facilitate secure access to on-chip and off-chip accelerators in computer platforms and systems. The ISA ENQ with WK instructions include a dest operand having an address of an accelerator portal and a scr operand having the address of a request descriptor in system memory defining a job to be performed by an accelerator and including a wrapped key. Execution of the instruction writes a record including the src and a WK to the portal, and the record is enqueued in an accelerator queue if a slot is available. The accelerator reads the enqueued request descriptor and uses the WK to unwrap the wrapped key, which is then used to decrypt encrypted data read from one or more buffers in memory. The accelerator then performs one or more functions on the decrypted data as defined by the job and writes the output of the processing back to memory with optional encryption.

    COMPRESSED CACHE MEMORY WITH PARALLEL DECOMPRESS ON FAULT

    公开(公告)号:US20220197816A1

    公开(公告)日:2022-06-23

    申请号:US17130638

    申请日:2020-12-22

    Abstract: An embodiment of an integrated circuit may comprise, coupled to a core, hardware decompression accelerators, a compressed cache, a processor communicatively coupled to the hardware decompression accelerators and the compressed cache, and memory communicatively coupled to the processor, wherein the memory stores microcode instructions that when executed by the processor causes the processor to load a page table entry in response to an indication of a page fault, determine if the page table entry indicates that the page is to be decompressed on fault, and, if so determined, modify a first decompression work descriptor at a first address and a second decompression work descriptor at a second address based on information from the page table entry, and generate a first enqueue transaction to the hardware decompression accelerators with the first address of the first decompression work descriptor and a second enqueue transaction to the hardware decompression accelerators with the second address of the second decompression work descriptor. Other embodiments are disclosed and claimed.

    APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE

    公开(公告)号:US20220197659A1

    公开(公告)日:2022-06-23

    申请号:US17133622

    申请日:2020-12-23

    Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.

    Instructions and logic to provide SIMD SM4 cryptographic block cipher functionality

    公开(公告)号:US11303438B2

    公开(公告)日:2022-04-12

    申请号:US16928558

    申请日:2020-07-14

    Abstract: Instructions and logic provide for a Single Instruction Multiple Data (SIMD) SM4 round slice operation. Embodiments of an instruction specify a first and a second source data operand set, and substitution function indicators, e.g. in an immediate operand. Embodiments of a processor may include encryption units, responsive to the first instruction, to: perform a slice of SM4-round exchanges on a portion of the first source data operand set with a corresponding keys from the second source data operand set in response to a substitution function indicator that indicates a first substitution function, perform a slice of SM4 key generations using another portion of the first source data operand set with corresponding constants from the second source data operand set in response to a substitution function indicator that indicates a second substitution function, and store a set of result elements of the first instruction in a SIMD destination register.

Patent Agency Ranking