摘要:
Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.
摘要:
In general, in one aspect, the disclosure describes a processing unit that includes an input buffer to store data received by the processing unit, a memory, an arithmetic logic unit coupled to the input buffer and to the memory, an output buffer; and control logic having access to a control store of program instructions, the control logic to process instructions including an instruction to transfer data from the input buffer to the memory and an instruction to cause the arithmetic logic unit to perform an operation on operands provided by at least one of the memory and the input buffer, the instruction to output results of the operation to at least one of the memory and the output buffer.
摘要:
An arrangement is provided for performing modular exponentiations. A modular exponentiation may be performed by using multiple Montgomery multiplications. A Montgomery multiplication comprises a plurality of iterations of basic operations (e.g., carry-save additions), and is performed by a Montgomery multiplication engine (MME). Multiple MMEs of smaller sizes may be chained together to perform modular exponentiations of larger sizes. Additionally, a single MME of a smaller size may be scheduled to perform modular exponentiations of larger sizes. Moreover, the process of performing a Montgomery multiplication may be pipelined both horizontally and vertically. Furthermore, processes of performing two Montgomery multiplications may be interleaved and performed by the same MME or chained MMEs.
摘要:
Methods and apparatus to perform string matching for network packet inspection are disclosed. In some embodiments there is a set of string matching slice circuits, each slice circuit of the set being configured to perform string matching steps in parallel with other slice circuits. Each slice circuit may include an input window storing some number of bytes of data from an input data steam. The input window of data may be padded if necessary, and then multiplied by a polynomial modulo an irreducible Galois-field polynomial to generate a hash index. A storage location of a memory corresponding to the hash index may be accessed to generate a slice-hit signal of a set of H slice-hit signals. The slice-hit signal may be provided to an AND-OR logic array where the set of H slice-hit signals is logically combined into a match result.
摘要:
In one embodiment, the present disclosure provides a method capable of processing a variety of different operations. A method according to one embodiment may include loading configuration data from a shared memory unit into a hardware configuration register, the hardware configuration register located within circuitry included within a hardware accelerator unit. The method may also include issuing a command set from a microengine to the hardware accelerator unit having the circuitry. The method may additionally include receiving the command set at the circuitry from the microengine, the command set configured to allow for the processing of a variety of different operations. The method may further include processing an appropriate operation based upon the configuration data loaded into the hardware configuration register. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要:
An acceleration unit offloads computationally intensive tasks from a processor. The acceleration unit includes two data processing paths each having an Arithmetic Logical Unit and sharing a single multiplier unit. Each data processing path may perform configurable operations in parallel on a same data. Special multiplexer paths and instructions are provided to allow P and Q type syndromes to be computed on a stripe in a single-pass of the data through the acceleration unit.
摘要:
An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.
摘要:
A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a 'one round' pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.