摘要:
Processor instructions for determining two minimum and two maximum values and associated apparatus and methods. The instructions include various 2MIN instructions for determining the two smallest values among three or four input values and 2MAX instructions for determining the two largest values among three or four input values. The 2MIN instructions employ two operands, with the first operand in some of the variations storing concatenated min1 and min2 values in a first register and a scr2 comparison value or two src2 concatenated src2 values in a second register. Comparators are used to implement hardware logic for determining whether the scr2 value(s) is/are less than each of min1 and min2. Based on the hardware logic, the smallest two values among min1, min2, and src2 (or both src2 values) are stored as concatenated values in the first register. The 2MAX instructions are implemented in a similar manner, except the comparisons are whether the scr2 value(s) is/are greater than each of max1 and max2 values. 128-bit 2MIN and 2MAX SIMD instructions are also provided for processing two 64-bit data-paths in parallel.
摘要:
Methods and apparatuses relating to stateful compression and decompression operations are described. In one embodiment, hardware processor includes a core to execute a thread and offload at least one of a compression and decompression thread, and a hardware compression and decompression accelerator to execute the at least one of the compression and decompression thread to consume input and generate output data, wherein the hardware compression and decompression accelerator is coupled to a plurality of input buffers to store the input data, a plurality of output buffers to store the output data, an input buffer descriptor array with an entry for each respective input buffer, an input buffer response descriptor array with a corresponding response entry for each respective input buffer, an output buffer descriptor array with an entry for each respective output buffer, and an output buffer response descriptor array with a corresponding response entry for each respective output buffer.
摘要:
Methods and apparatuses relating to data decompression are described. In one embodiment, a hardware processor includes a core to execute a thread and offload a decompression thread for an encoded, compressed data stream comprising a literal code, a length code, and a distance code, and a hardware decompression accelerator to execute the decompression thread to selectively provide the encoded, compressed data stream to a first circuit to serially decode the literal code to a literal symbol, serially decode the length code to a length symbol, and serially decode the distance code to a distance symbol, and selectively provide the encoded, compressed data stream to a second circuit to look up the literal symbol for the literal code from a table, look up the length symbol for the length code from the table, and look up the distance symbol for the distance code from the table.
摘要:
According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
摘要:
A processor includes a memory hierarchy, buffer, and a compression module. The compression module includes logic to evaluate a stream of data to be compressed according to a compression scheme, selectively modify a format of the compression scheme based upon a number of literals received, compress a sequence of the data to produce the output data sequence, and send the output data sequence to the memory hierarchy.
摘要:
According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.
摘要:
A method of managing a free list and ring data structure, which may be used to store journaling information, by storing and modifying information describing a structure of the free list or ring data structure in a cache memory that may also be used to store information describing a structure of a queue of buffers.
摘要:
An assembler, which can be provided as part of a debugger and/or development system, avoids register allocation errors, such as register bank conflicts and/or insufficient physical registers, automatically.
摘要:
A method of debugging code that executes in a multithreaded processor having a microengines includes receiving a program instruction and an identification representing a selected one of the microengines from a remote user interface connected to the processor pausing program execution in the threads executing in the selected microengine, inserting a breakpoint after a program instruction in the selected microengine that matches the program instruction received from the remote user interface, resuming program execution in the selected microengine, executing a breakpoint routine if program execution in the selected microengine encounters the breakpoint and resuming program execution in the microengine.
摘要:
A method and mechanism for executing an application by a processor in a multi-processor configuration of processors, each having an associated instruction memory is presented. The application receives object code that includes an image for at least one other processor in the multi-processor configuration of processors. The application binds an import variable in the image to a parameter value and stores the image for the at least one other processor into the associated instruction memory.