摘要:
Systems and methods of performing a fused multiply add (FMA) operations are provided. In one embodiment, the length of the adder used by the FMA operation is less than 3*N, where N is the number of bits in the mantissa term of a floating point number. A mask may be used to perform the addition portion of the FMA operation using the adder. A second mask may be used to denormalize the result of the addition portion of the FMA operation if an underflow occurs.
摘要:
The Advanced Encryption Standard (AES) is a symmetric block cipher that can encrypt and decrypt information. Encryption (cipher) performs a series of transformations (Shift Rows, Substitute Bytes, Mix Columns) using the secret key (cipher key) to transforms intelligible data referred to as “plaintext” into an unintelligible form referred to as “cipher text”. The transformations (Inverse Shift Rows, Inverse Substitute Bytes, Inverse Mix Columns) in the inverse cipher (decryption) are the inverse of the transformations in the cipher. Encryption and decryption is performed efficiently through the use of instructions that perform the series of transformations. Combinations of these instructions allow the isolation of the transformations (Shift Rows, Substitute Bytes, Mix Columns, Inverse Shift Rows, Inverse Substitute Bytes, Inverse Mix Columns) to be obtained.
摘要:
A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.
摘要:
A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.
摘要:
In one embodiment, the present invention includes a retirement unit to receive and retire executed instructions. The retirement unit may include a first array to receive information at allocation and a second array to receive information after execution. The retirement unit may further include logic to calculate an event associated with an executed instruction if information associated with the executed instruction is stored in an on-demand portion of at least one of arrays. Other embodiments are described and claimed.
摘要:
Method, apparatus and system embodiments provide a register to track the oldest exception event or sticky event in a processor. The processor may be an out-of-order processor. Dispatched instructions (or micro-ops) may be maintained in a queue, such as a reorder buffer (ROB), for in-order retirement. For at least one embodiment, event information is maintained only in the register and is not maintained in a ROB. For at least one other embodiment, event information is maintained in a ROB entry for some events and in the register for others. For such latter embodiment, a retire engine takes the contents of both the ROB entry and the register into account when determining whether to take an exception or otherwise initiate a handling sequence during in-order instruction retirement. Other embodiments are also described and claimed.
摘要:
The throughput of an encryption/decryption operation is increased in a system having a pipelined execution unit. Different independent encryptions (decryptions) of different data blocks may be performed in parallel by dispatching an AES round instruction in every cycle.
摘要:
A system and method of optimizing system memory bus bandwidth in a computer system. The system prepares to receive first data from system memory in accordance with at least one read request by evicting previously stored second data to a write back buffer. The at least one read request is then issued consecutively to system memory via the system memory bus. After issuance of the at least one read request, at least one write request is issued consecutively to send the second data in the write back buffer to the system memory via the system memory bus. The consecutive issuance of read and write requests avoids read-to-write and write-to-read bubbles that occur when alternating read and write requests are issued to system memory.
摘要:
An out-of-order subsystem of a processor includes a register alias table and allocation (RAT/ALLOC) unit, a reservation station (RS) and a reorder buffer (ROB). Destination identifiers of one or more execution results that are not yet stored in any register file of the ROB may be compared to source identifiers of operands of micro-operations that are being issued to the RS. Each execution result corresponding to a destination identifier that matches one of the source identifiers is retrieved from a data path external to the ROB and routed to an appropriate port of the RS for an operand corresponding to the source identifier so that the RAT/ALLOC unit does not need to allocate a read port of the ROB for the RS to read the execution result.
摘要:
A memory system is provided for storing multiple data types. The memory system includes a main memory, a local cache, and a translation unit. The local cache has multiple entries, each of which includes a data field to store data and a status field to indicate a storage state for the stored data. The translation unit includes a translation lookaside buffer (TLB) and a status-cache (STC). The TLB stores address translations for data in the main memory, and the STC stores storage states for data indicated by the address translations.