-
1.
公开(公告)号:US20250004772A1
公开(公告)日:2025-01-02
申请号:US18217499
申请日:2023-06-30
Applicant: Intel Corporation
Inventor: Kamlesh PILLAI , Vinodh GOPAL , Gurpreet Singh KALSI , Sreenivas SUBRAMONEY , Wajdi K. FEGHALI
IPC: G06F9/30
Abstract: Apparatus and method for a decompression hardware copy engine with efficient sequence overlapping copy. For example, one embodiment of an apparatus comprises: a plurality of processing cores, one or more of the plurality of processing cores to execute program code to produce a plurality of literals and sequences from a compressed data stream; and decompression acceleration circuitry to generate a decompressed data stream based on the plurality of literals and sequences, the decompression acceleration circuitry comprising: a sequence pre-processor circuit to process batches of sequences of the plurality of sequences and generate a plurality of copy instructions, the sequence pre-processor circuit to merge multiple copy operations corresponding to multiple sequences into a merged copy instruction; and a copy engine circuit to execute the copy instructions to produce the decompressed data stream.
-
公开(公告)号:US20210351790A1
公开(公告)日:2021-11-11
申请号:US16872144
申请日:2020-05-11
Applicant: Intel Corporation
Inventor: James GUILFORD , Vinodh GOPAL , Dan CUTTER , Kirk YAP , Wajdi FEGHALI , George POWLEY
Abstract: A lossless data compressor of an aspect includes a first lossless data compressor circuitry coupled to receive input data. The first lossless data compressor circuitry is to apply a first lossless data compression approach to compress the input data to generate intermediate compressed data. The apparatus also includes a second lossless data compressor circuitry coupled with the first lossless data compressor circuitry to receive the intermediate compressed data. The second lossless data compressor circuitry is to apply a second lossless data compression approach to compress at least some of the intermediate compressed data to generate compressed data. The second lossless data compression approach different than the first lossless data compression approach. Lossless data decompressors are also disclosed, as are methods of lossless data compression and decompression.
-
公开(公告)号:US20200348937A1
公开(公告)日:2020-11-05
申请号:US16934003
申请日:2020-07-20
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20190196824A1
公开(公告)日:2019-06-27
申请号:US16311231
申请日:2016-12-31
Applicant: INTEL CORPORATION
Inventor: Xiaodong LIU , Qihua DAI , Weigang LI , Vinodh GOPAL
CPC classification number: H04Q11/0005 , B25J15/0014 , B65G1/0492 , G02B6/3882 , G02B6/3893 , G02B6/3897 , G02B6/4292 , G02B6/4452 , G05D23/1921 , G05D23/2039 , G06F1/183 , G06F3/061 , G06F3/0611 , G06F3/0613 , G06F3/0616 , G06F3/0619 , G06F3/0625 , G06F3/0631 , G06F3/0638 , G06F3/064 , G06F3/0647 , G06F3/0653 , G06F3/0655 , G06F3/0658 , G06F3/0659 , G06F3/0664 , G06F3/0665 , G06F3/067 , G06F3/0673 , G06F3/0679 , G06F3/0683 , G06F3/0688 , G06F3/0689 , G06F8/65 , G06F9/30036 , G06F9/3887 , G06F9/4401 , G06F9/5016 , G06F9/5044 , G06F9/505 , G06F9/5072 , G06F9/5077 , G06F9/544 , G06F11/141 , G06F11/3414 , G06F12/0862 , G06F12/0893 , G06F12/10 , G06F12/109 , G06F12/1408 , G06F13/161 , G06F13/1668 , G06F13/1694 , G06F13/4022 , G06F13/4068 , G06F13/409 , G06F13/42 , G06F13/4282 , G06F15/8061 , G06F16/9014 , G06F2209/5019 , G06F2209/5022 , G06F2212/1008 , G06F2212/1024 , G06F2212/1041 , G06F2212/1044 , G06F2212/152 , G06F2212/202 , G06F2212/401 , G06F2212/402 , G06F2212/7207 , G06Q10/06 , G06Q10/06314 , G06Q10/087 , G06Q10/20 , G06Q50/04 , G07C5/008 , G08C17/02 , G08C2200/00 , G11C5/02 , G11C5/06 , G11C7/1072 , G11C11/56 , G11C14/0009 , H03M7/30 , H03M7/3084 , H03M7/3086 , H03M7/40 , H03M7/4031 , H03M7/4056 , H03M7/4081 , H03M7/6005 , H03M7/6023 , H04B10/25 , H04B10/2504 , H04L9/0643 , H04L9/14 , H04L9/3247 , H04L9/3263 , H04L12/2809 , H04L29/12009 , H04L41/024 , H04L41/046 , H04L41/0813 , H04L41/082 , H04L41/0896 , H04L41/12 , H04L41/145 , H04L41/147 , H04L41/5019 , H04L43/065 , H04L43/08 , H04L43/0817 , H04L43/0876 , H04L43/0894 , H04L43/16 , H04L45/02 , H04L45/52 , H04L47/24 , H04L47/38 , H04L47/765 , H04L47/782 , H04L47/805 , H04L47/82 , H04L47/823 , H04L49/00 , H04L49/15 , H04L49/25 , H04L49/357 , H04L49/45 , H04L49/555 , H04L67/02 , H04L67/10 , H04L67/1004 , H04L67/1008 , H04L67/1012 , H04L67/1014 , H04L67/1029 , H04L67/1034 , H04L67/1097 , H04L67/12 , H04L67/16 , H04L67/306 , H04L67/34 , H04L69/04 , H04L69/329 , H04Q1/04 , H04Q11/00 , H04Q11/0003 , H04Q11/0062 , H04Q11/0071 , H04Q2011/0037 , H04Q2011/0041 , H04Q2011/0052 , H04Q2011/0073 , H04Q2011/0079 , H04Q2011/0086 , H04Q2213/13523 , H04Q2213/13527 , H04W4/023 , H04W4/80 , H05K1/0203 , H05K1/181 , H05K5/0204 , H05K7/1418 , H05K7/1421 , H05K7/1422 , H05K7/1447 , H05K7/1461 , H05K7/1485 , H05K7/1487 , H05K7/1489 , H05K7/1491 , H05K7/1492 , H05K7/1498 , H05K7/2039 , H05K7/20709 , H05K7/20727 , H05K7/20736 , H05K7/20745 , H05K7/20836 , H05K13/0486 , H05K2201/066 , H05K2201/10121 , H05K2201/10159 , H05K2201/10189 , Y02D10/14 , Y02D10/151 , Y02P90/30 , Y10S901/01
Abstract: Technologies for adaptive processing of multiple buffers is disclosed. A compute device may establish a buffer queue to which applications can submit buffers to be processed, such as by hashing the submitted buffers. The compute device monitors the buffer queue and determines an efficient way of processing the buffer queue based on the number of buffers present. The compute device may process the buffers serially with a single processor core of the compute device or may process the buffers in parallel with single-instruction, multiple data (SIMD) instructions. The compute device may determine which method to use based on a comparison of the throughput of serially processing the buffers as compared to parallel processing the buffers, which may depend on the number of buffers in the buffer queue.
-
公开(公告)号:US20220232423A1
公开(公告)日:2022-07-21
申请号:US17704658
申请日:2022-03-25
Applicant: Intel Corporation
Inventor: Akhilesh THYAGATURU , Mohit GARG , Vinodh GOPAL , Ned M. SMITH
Abstract: The present disclosure describes edge computing over disaggregated radio access network (RAN) infrastructure through dynamic edge data extraction. Edge data is extracted at intermediate stages of RAN processing, provided to edge compute functions, and inserted back into the RAN processing pipeline. These mechanisms allow for the processing of edge data traffic much closer to the data source than existing approaches, which decreases the overall latency and delay. Additionally, these mechanisms do not require changes to already existing network protocols, allowing for non-complex adoption and implementation.
-
公开(公告)号:US20220171627A1
公开(公告)日:2022-06-02
申请号:US17672253
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20220075655A1
公开(公告)日:2022-03-10
申请号:US17529149
申请日:2021-11-17
Applicant: Intel Corporation
Inventor: Akhilesh S. THYAGATURU , Mohit Kumar GARG , Vinodh GOPAL
IPC: G06F9/50
Abstract: Methods, apparatus, and software for efficient accelerator offload in multi-accelerator frameworks. One multi-accelerator framework employs a compute platform including a plurality of processor cores and a plurality of accelerator devices. An application is executed on a first core and a portion of the application workload is offloaded to a first accelerator device. In connection with moving execution of the application to a second core, a second accelerator devices to be used for the offloaded workload is selected based on core-to-accelerator cost information for the second core. This core-to-accelerator cost information includes core-accelerator cost information for combinations of core-accelerator pairs, which are based, at least on part, on latencies projected for interconnect paths between cores and accelerators. Both single-socket and multi-socket platform are supported. The solutions include mechanisms for moving offloaded workloads for multiple accelerator devices, as well as synchronizing accelerator operations and workflows.
-
公开(公告)号:US20170147340A1
公开(公告)日:2017-05-25
申请号:US15396568
申请日:2016-12-31
Applicant: Intel Corporation
Inventor: Kirk S. YAP , Gilbert M. WOLRICH , James D. GUILFORD , Vinodh GOPAL , Erdinc OZTURK , Sean M. GULLEY , Wajdi K. FEGHALI , Martin G. DIXON
IPC: G06F9/30 , G06F9/38 , H04L9/06 , G06F12/0875 , G06F12/1027 , G06F15/80 , G06F12/0897
CPC classification number: G06F9/3016 , G06F9/30007 , G06F9/30036 , G06F9/30058 , G06F9/30098 , G06F9/30145 , G06F9/3802 , G06F9/384 , G06F12/0875 , G06F12/0897 , G06F12/1027 , G06F15/8007 , G06F21/602 , G06F2212/452 , G06F2212/68 , H04L9/0643 , H04L9/3239 , H04L2209/125
Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
-
9.
公开(公告)号:US20190310848A1
公开(公告)日:2019-10-10
申请号:US16452390
申请日:2019-06-25
Applicant: INTEL CORPORATION
Inventor: Vinodh GOPAL , Wajdi FEGHALI , Gilbert WOLRICH , Kirk YAP
IPC: G06F9/30
Abstract: An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, in one embodiment, a processor comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.
-
公开(公告)号:US20170147342A1
公开(公告)日:2017-05-25
申请号:US15396576
申请日:2016-12-31
Applicant: Intel Corporation
Inventor: Kirk S. YAP , Gilbert M. WOLRICH , James D. GUILFORD , Vinodh GOPAL , Erdinc OZTURK , Sean M. GULLEY , Wajdi K. FEGHALI , Martin G. DIXON
IPC: G06F9/30 , G06F9/38 , H04L9/06 , G06F12/0897 , G06F12/0875 , G06F12/1027 , G06F15/80 , G06F13/28
CPC classification number: G06F9/3016 , G06F9/30007 , G06F9/30036 , G06F9/30058 , G06F9/30098 , G06F9/30156 , G06F9/3802 , G06F9/384 , G06F9/3855 , G06F12/0875 , G06F12/0897 , G06F12/1027 , G06F13/28 , G06F13/4068 , G06F13/4282 , G06F15/8007 , G06F21/602 , G06F2212/452 , G06F2212/68 , G06F2213/0026 , G09C1/00 , H04L9/0643 , H04L9/3239 , H04L2209/122
Abstract: A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
-
-
-
-
-
-
-
-
-