摘要:
An apparatus and method are described for real time instruction tracing. For example, a method according to one embodiment comprises: recording user specified address ranges for which tracing is required; monitoring a next linear instruction pointer (NLIP) values and/or branch linear instruction pointer (BLIP) values to determine if address range has been entered; when the range is entered, compressing the NLIP and/or BLIP values and constructing fixed length packets containing the tracing data; and transferring the fixed length packets to a memory execution cluster.
摘要:
In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing optional logging of debug activities in a real time instruction tracing log. For example, in one embodiment, such means may include an integrated circuit having means for initiating instruction tracing for instructions of a traced application, mode, or code region, as the instructions are executed by the integrated circuit; means for generating a plurality of packets to a debug log describing the instruction tracing; means for initiating an alternative mode of execution within the integrated circuit; and means for suppressing indication of entering the alternative mode of execution. Additional and alternative means may be implemented for selectively causing an integrated circuit to operate in accordance with an invisible trace mode or a visible trace mode upon transition to the alternative mode of execution.
摘要:
A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.
摘要:
Method, apparatus, and program means for performing a packed multiply high with round and shift operation. The method of one embodiment comprises receiving a first operand having a first set of L data elements. A second operand having a second set of L data elements is received. L pairs of data elements are multiplied together to generate a set of L products. Each of the L pairs includes a first data element from the first set of L data element and a second data element from a corresponding data element position of the second set of L data elements. Each of the L products are rounded to generate L rounded values. Each of said L rounded values are scaled to generate L scaled values. Each of the L scaled values are truncated for storage at a destination. Each truncated value is to be stored at a data element position corresponding to its pair of data elements.
摘要:
A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.
摘要:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed byte data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed byte data and a second packed byte data. The processor performs operations on data elements in said first packed byte data and said second packed byte data to generate a third packed data in response to receiving an instruction. A plurality of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data.
摘要:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
摘要:
A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.
摘要:
A computer system and processor for elimination of move operations include circuits that obtain a computer instruction and bypass execution units in response to determining that the instruction includes a move operation that involves a transfer of data from a logical source register to a logical destination register. Instead of executing the move operation, the transfer of the data is performed by tracking changes in data dependencies of the source and the destination registers, and assigning a physical register associated with the source register to the destination register based on the dependencies.
摘要:
A computer system and processor for elimination of move operations include circuits that obtain a computer instruction and bypass execution units in response to determining that the instruction includes a move operation that involves a transfer of data from a logical source register to a logical destination register. Instead of executing the move operation, the transfer of the data is performed by tracking changes in data dependencies of the source and the destination registers, and assigning a physical register associated with the source register to the destination register based on the dependencies.