-
公开(公告)号:US11249761B2
公开(公告)日:2022-02-15
申请号:US16934003
申请日:2020-07-20
Applicant: Intel Corporation
Inventor: Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20220027154A1
公开(公告)日:2022-01-27
申请号:US17496632
申请日:2021-10-07
Applicant: Intel Corporation
Inventor: Vinodh Gopal , James D. Guilford , Gilbert M. Wolrich , Wajdi K. Feghali , Erdinc Ozturk , Martin G. Dixon , Sean P. Mirkes , Matthew C. Merten , Tong Li , Bret T. Toll, I
Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
-
143.
公开(公告)号:US11200054B2
公开(公告)日:2021-12-14
申请号:US16019302
申请日:2018-06-26
Applicant: Intel Corporation
Inventor: Vinodh Gopal
Abstract: Apparatus and associated methods for implementing atomic instructions for copy-XOR of data. An atomic-copy-xor instruction is defined having a first operand comprising an address of a first cacheline and a second operand comprising an address of a second cacheline. The atomic-copy-xor instruction, which may be included in an instruction set architecture (ISA) of a processor, performs a bitwise XOR operation on copies of data retrieved from the first cacheline and second cacheline to generate an XOR result, and replaces the data in the first cacheline with a copy of data from the second cacheline when the XOR result is non-zero. In addition to implementation using a processor core, the atomic-copy-xor instruction may be implemented using various offloading schemes under which the processor core executing the atomic-copy-xor instruction offloads operations to other components in the processor or system in which the processor is implemented, including offloading operations to a last level cache (LLC) engine, a memory controller, or a DIMM controller.
-
公开(公告)号:US11188335B2
公开(公告)日:2021-11-30
申请号:US17087536
申请日:2020-11-02
Applicant: Intel Corporation
Inventor: Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati
Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.
-
公开(公告)号:US11126663B2
公开(公告)日:2021-09-21
申请号:US15604793
申请日:2017-05-25
Applicant: Intel Corporation
Inventor: Sudhir K. Satpathy , Vikram B. Suresh , Sanu K. Mathew , Vinodh Gopal
IPC: G06F16/903 , G06F16/901 , G06F12/02
Abstract: In one embodiment, an apparatus comprises a decompression engine to determine a plurality of tokens used to encode a block of data; populate a lookup table with at least two of the tokens in order of increasing token length; disable a first portion of the lookup table and enable a second portion of the lookup table based on a value of a payload of the block of data; and search for a match between a token and the payload in the second portion of the lookup table.
-
公开(公告)号:US20210211139A1
公开(公告)日:2021-07-08
申请号:US16996012
申请日:2020-08-18
Applicant: Intel Corporation
Inventor: Vinodh Gopal , James D. Guilford , Sudhir K. Satpathy , Sanu K. Mathew
Abstract: Methods and apparatus to parallelize data decompression are disclosed. An example method selecting initial starting positions in a compressed data bitstream; adjusting a first one of the initial starting positions to determine a first adjusted starting position by decoding the bitstream starting at a training position in the bitstream, the decoding including traversing the bitstream from the training position as though first data located at the training position is a valid token; outputting first decoded data generated by decoding a first segment of the bitstream starting from the first adjusted starting position; and merging the first decoded data with second decoded data generated by decoding a second segment of the bitstream, the decoding of the second segment starting from a second position in the bitstream and being performed in parallel with the decoding of the first segment, and the second segment preceding the first segment in the bitstream.
-
公开(公告)号:US10924591B2
公开(公告)日:2021-02-16
申请号:US16014690
申请日:2018-06-21
Applicant: INTEL CORPORATION
Inventor: Wajdi Feghali , Vinodh Gopal , Kirk Yap , Sean Gulley , Simon Peffers
IPC: H04L29/06 , H04L12/863
Abstract: Methods and apparatus for low-latency link compression schemes. Under the schemes, selected packets or messages are dynamically selected for compression in view of current transmit queue levels. The latency incurred during compression and decompression is not added to the data-path, but sits on the side of the transmit queue. The system monitors the queue depth and, accordingly, initiates compression jobs based on the depth. Different compression levels may be dynamically selected and used based on queue depth. Under various schemes, either packets or messages are enqueued in the transmit queue or pointers to such packets and messages are enqueued. Additionally, packets/message may be compressed prior to being enqueued, or after being enqueued, wherein an original uncompressed packet is replaced with a compressed packet. Compressed and uncompressed packets may be stored in queues or buffers and transmitted using a different numbers of transmit cycles based on their compression ratios. The schemes may be implemented to improve the effective bandwidth of various types of links, including serial links, bus-type links, and socket-to-socket links in multi-socket systems.
-
公开(公告)号:US10686591B2
公开(公告)日:2020-06-16
申请号:US16208542
申请日:2018-12-03
Applicant: Intel Corporation
Inventor: Gilbert M. Wolrich , Vinodh Gopal , Kirk S. Yap
Abstract: Instructions and logic provide SIMD secure hashing round slice functionality. Some embodiments include a processor comprising: a decode stage to decode an instruction for a SIMD secure hashing algorithm round slice, the instruction specifying a source data operand set, a message-plus-constant operand set, a round-slice portion of the secure hashing algorithm round, and a rotator set portion of rotate settings. Processor execution units, are responsive to the decoded instruction, to perform a secure hashing round-slice set of round iterations upon the source data operand set, applying the message-plus-constant operand set and the rotator set, and store a result of the instruction in a SIMD destination register. One embodiment of the instruction specifies a hash round type as one of four MD5 round types. Other embodiments may specify a hash round type by an immediate operand as one of three SHA-1 round types or as a SHA-2 round type.
-
149.
公开(公告)号:US10684855B2
公开(公告)日:2020-06-16
申请号:US15686889
申请日:2017-08-25
Applicant: Intel Corporation
Inventor: Vinodh Gopal , James D. Guilford , Erdinc Ozturk , Wajdi K. Feghali , Gilbert M. Wolrich , Martin G. Dixon
Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
-
公开(公告)号:US10581590B2
公开(公告)日:2020-03-03
申请号:US14984637
申请日:2015-12-30
Applicant: Intel Corporation
Inventor: Shay Gueron , Wajdi K Feghali , Vinodh Gopal , Raghunandan Makaram , Martin G Dixon , Srinivas Chennupaty , Michael E Kounavis
IPC: H04L9/28 , G06F21/72 , H04L9/06 , G06F9/30 , G06F9/38 , H04L9/08 , G06F12/14 , G06F21/60 , G06F12/0875 , G06F12/0862 , G11C7/10 , G06F3/06
Abstract: A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.
-
-
-
-
-
-
-
-
-