-
公开(公告)号:US11188335B2
公开(公告)日:2021-11-30
申请号:US17087536
申请日:2020-11-02
Applicant: Intel Corporation
Inventor: Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati
Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.
-
公开(公告)号:US10924591B2
公开(公告)日:2021-02-16
申请号:US16014690
申请日:2018-06-21
Applicant: INTEL CORPORATION
Inventor: Wajdi Feghali , Vinodh Gopal , Kirk Yap , Sean Gulley , Simon Peffers
IPC: H04L29/06 , H04L12/863
Abstract: Methods and apparatus for low-latency link compression schemes. Under the schemes, selected packets or messages are dynamically selected for compression in view of current transmit queue levels. The latency incurred during compression and decompression is not added to the data-path, but sits on the side of the transmit queue. The system monitors the queue depth and, accordingly, initiates compression jobs based on the depth. Different compression levels may be dynamically selected and used based on queue depth. Under various schemes, either packets or messages are enqueued in the transmit queue or pointers to such packets and messages are enqueued. Additionally, packets/message may be compressed prior to being enqueued, or after being enqueued, wherein an original uncompressed packet is replaced with a compressed packet. Compressed and uncompressed packets may be stored in queues or buffers and transmitted using a different numbers of transmit cycles based on their compression ratios. The schemes may be implemented to improve the effective bandwidth of various types of links, including serial links, bus-type links, and socket-to-socket links in multi-socket systems.
-
公开(公告)号:US09503256B2
公开(公告)日:2016-11-22
申请号:US14582707
申请日:2014-12-24
Applicant: Intel Corporation
Inventor: Kirk Yap , Gilbert Wolrich , Sudhir Satpathy , Sean Gulley , Vinodh Gopal , Sanu Mathew , Wajdi Feghali
CPC classification number: H04L9/0822 , G09C1/00 , H04L9/0631 , H04L2209/122
Abstract: Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.
Abstract translation: 公开了用于SMS4加速硬件的发明的实施例。 在一个实施例中,一种装置包括SMS4硬件和密钥变换硬件。 SMS4硬件是执行一轮加密和一轮密钥扩展。 密钥转换硬件是转换密钥以提供SMS4硬件来执行一轮解密。
-
公开(公告)号:US12182018B2
公开(公告)日:2024-12-31
申请号:US17133615
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
IPC: G06F12/0811 , G06F9/38 , G06F12/0862 , G06F12/0895
Abstract: Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.
-
25.
公开(公告)号:US12028094B2
公开(公告)日:2024-07-02
申请号:US17133622
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
CPC classification number: H03M7/6029 , G06F9/3877 , G06F9/541
Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20240211253A1
公开(公告)日:2024-06-27
申请号:US18145744
申请日:2022-12-22
Applicant: Intel Corporation
Inventor: Santosh Ghosh , Christoph Dobraunig , Manoj Sastry , Andrew H. Reinders , Regev Shemy , Qian Wang , Rotem Ohana Peretz , Wing Shek Wong , Wajdi Feghali
CPC classification number: G06F9/30029 , G06F9/3016 , G06F9/3802
Abstract: A method comprises fetching, by fetch circuitry, an encoded parity instruction comprising at least one opcode, a first source identifier for a first source, a second source identifier for a second source, a third source identifier for a third source, and a destination identifier for a destination, decoding, by decode circuitry, the encoded parity instruction to generate a decoded parity instruction; and executing, by execution circuitry, the decoded parity instruction to retrieve operands representing a first register from the first source, a second register from the second source, a third register from the third source, and an index from the third source, perform an XOR operation of four words of data from the first register and single word of data from the second register in a position represented by the index to generate a parity value, and store the parity value in a the first register in a position represented by the index.
-
公开(公告)号:US11681530B2
公开(公告)日:2023-06-20
申请号:US17688728
申请日:2022-03-07
Applicant: Intel Corporation
Inventor: Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati
CPC classification number: G06F9/30145 , G06F9/30043 , G06F9/30196 , G06F9/3887 , H04L9/0643
Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.
-
公开(公告)号:US11567772B2
公开(公告)日:2023-01-31
申请号:US17537373
申请日:2021-11-29
Applicant: Intel Corporation
Inventor: Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati
Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.
-
公开(公告)号:US20220416999A1
公开(公告)日:2022-12-29
申请号:US17358897
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Wajdi Feghali , Changwon Rhee , Wei-Yu Chen , Timothy R. Bauer , Alexander Lyashevsky
Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.
-
公开(公告)号:US11494320B2
公开(公告)日:2022-11-08
申请号:US16140472
申请日:2018-09-24
Applicant: Intel Corporation
Inventor: Simon N Peffers , Kirk S Yap , Sean Gulley , Vinodh Gopal , Wajdi Feghali
Abstract: Apparatus, systems and methods for implementing delayed decompression schemes. As a burst of packets comprising compressed packets and uncompressed packets are received over an interconnect link, they are buffered in a receive buffer without decompression. Subsequently, the packets are forwarded from the receive buffer to a consumer such as processor core, with the compressed packets being decompressed prior to reaching the processor core. Under a first delayed decompression approach, packets are decompressed when they are read from the receive buffer in conjunction with forwarding the uncompressed packet (or uncompressed data contained therein) to the consumer. Under a second delayed decompression scheme, the packets are read from the receive buffer and forwarded to a decompressor using a first datapath width matching the width of the packets, decompressed, and then forwarded to the consumer using a second datapath width matching the width of the uncompressed data.
-
-
-
-
-
-
-
-
-