摘要:
A request hint is issued prior to or while identifying whether requested data and/or one or more instructions are in a first memory. A second memory is accessed to fetch data and/or one or more instructions in response to the request hint. The data and/or instruction(s) accessed from the second memory are stored in a buffer. If the requested data and/or instruction(s) are not in the first memory, the data and/or instruction(s) are returned from the buffer.
摘要:
A floating point arithmetic apparatus for converting numbers between an integer format and a floating point format, wherein a conversion operation requires a greater data path width than a conversion operation. The apparatus comprises right shift circuitry that receives a number in the floating point format, wherein the right shift circuitry includes additional register positions to accommodate a shift beyond a data path width required by an arithmetic operation.
摘要:
A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
摘要:
A method and apparatus are provided for executing scalar packed data instructions. According to one aspect of the invention, a processor includes a plurality of registers, a register renaming unit coupled to the plurality of registers, and a decoder coupled to the register renaming unit. The register renaming unit provides an architectural register file to store packed data operands each of which include a plurality of data elements. The decoder is configured to decode a first and second set of instructions (e.g., a set of full-width packed data instructions and a set of partial-width packed data instructions) that each specify one or more registers in the architectural register file. Each of the instructions in the first set of instructions specify operations to be performed on all of the data elements stored in the one or more specified registers. In contrast, each of the instructions in the second set of instructions specify operations to be performed on only a subset of the data element stored in the one or more specified registers.
摘要:
A method and apparatus is disclosed that computes multiple absolute differences from packed data and sums each one of the multiple absolute differences together to produce a result. According to one embodiment, a processor includes a decode unit to decode a packed sum of absolute differences (PSAD) instruction having an opcode format to identify a set of packed data operands. The decode unit initiates a sequence of operations on the set of packed data operands in response to decoding the PSAD instruction. An execution unit performs a first operation of the sequence of operations initiated by the decode logic, and a bus provides the execution unit with the set of packed data operands as identified in accordance with the opcode format.
摘要:
An efficient way to determine which objects in a 3D image are to be displayed and which are not because they are obscured by other displayed objects. Displayable elements are assigned depth values defining their relative perceived nearness to the viewer of the image. A comparison of depth values determines which elements are to be displayed and which are not to be displayed because they are obscured by displayed elements. Rather than comparing the depth value of every pixel in a displayable object to determine whether it is to be displayed, the invention compares groups of pixels defined by spans. Minimum and maximum depth values are determined for each span so that depth variations within a span can be accommodated. Masks are used when only partial spans are to be considered because some pixels in a span are outside the pixel boundaries being considered in a particular comparison.
摘要:
A method and apparatus that adds each one of multiple elements of a packed data together to produce a result. According to one such a method and apparatus, each of a first set of portions of partial products is produced using a first set of partial product selectors in a multiplier, each of the first set of portions of the partial products being zero. Each of the multiple elements is inserted into one of a second set of portions of the partial products using a second set of partial product selectors, each of the second set of portions of the partial products being aligned. Each of the multiple elements are added together to produce the result including a field having the sum of the multiple elements.
摘要:
An instruction associated with a condition is executed. In executing the instruction, a first operation designated by the instruction is performed to produce a first result, and a second operation is performed to produce a second result. Both the first result and the second result are associated with the condition.
摘要:
An instruction associated with a condition is executed when the condition is resolved. In executing the instruction, a first operation designated by the instruction is performed to produce a first result, and a second operation is performed to produce a second result. The first result or the second result is output based on how the condition is resolved.
摘要:
A prefetcher to prefetch data for an instruction based on the distance between cache misses caused by the instruction. In an embodiment, the prefetcher includes a memory to store a prefetch table that contains one or more entries that include the distance between cache misses caused by an instruction. In a further embodiment, the addresses of data elements prefetched are determined based on the distance between cache misses recorded in the prefetch table for the instruction.