摘要:
Each instruction thread in a SMT processor is associated with a software assigned base input processing priority. Unless some predefined event or circumstance occurs with an instruction being processed or to be processed, the base input processing priorities of the respective threads are used to determine the interleave frequency between the threads according to some instruction interleave rule. However, upon the occurrence of some predefined event or circumstance in the processor related to a particular instruction thread, the base input processing priority of one or more instruction threads is adjusted to produce one more adjusted priority values. The instruction interleave rule is then enforced according to the adjusted priority value or values together with any base input processing priority values that have not been subject to adjustment.
摘要:
A method and system for prefetching in computer system are provided. The method in one aspect includes using a prefetch engine to perform prefetch instructions and to translate unmapped data. Misses to address translations during the prefetch are handled and resolved. The method also includes storing the resolved translations in a respective cache translation table. A system for prefetching in one aspect includes a prefetch engine operable to receive instructions to prefetch data from the main memory. The prefetch engine is also operable to search cache address translation for prefetch data and perform address mapping translation, if the prefetch data is unmapped. The prefetch engine is further operable to prefetch the data and store the address mapping in one or more cache memory, if the data is unmapped.
摘要:
A unified register rename mechanism for targets of different instruction types is provided in a microprocessor. The universal rename mechanism renames destinations of different instruction types using a single rename structure. Thus, an instruction that is updating a floating point register (FPR) can be renamed along with an instruction that is updating a general purpose register (GPR) or vector multimedia extensions (VMX) instructions register (VR) using the same rename structure because the number of architected states for GPR is the same as the number of architected states for FPR and VR. Each destination tag (DTAG) is assigned to one destination. A floating point instruction may be assigned to a DTAG, and then a fixed point instruction may be assigned to the next DTAG and so forth. With a universal rename mechanism, significant silicon and power can be saved by having only one rename structure for all instruction types.
摘要:
A processor includes at least one execution unit that executes instructions, at least one register file, coupled to the at least one execution unit, that buffers operands for access by the at least one execution unit, an instruction sequencing unit that fetches instructions for execution by the at least one execution unit, and an address generation accelerator. The address generation accelerator, responsive to an initiation signal received from the instruction sequencing unit, computes and outputs first and second effective addresses of operands of an operation.
摘要:
A processor has an associated memory hierarchy including a cache memory. The processor includes an instruction sequencing unit that fetches instructions for processing, an operand data structure including a plurality of entries corresponding to operands of operations to be performed by the processor, and a computation engine. A first entry among the plurality of entries in the operand data structure specifies a first caching policy for a first operand, and a second entry specifies a second caching policy for a second operand. The computation engine computes and stores operands in the memory hierarchy in accordance with the cache policies indicated within the operand data structure.
摘要:
Methods and apparatus are provided for sharing storage and execution resources between architectural units in a microprocessor using a polymorphic function unit. A method for executing instructions in a processor having a polymorphic execution unit includes the steps of reloading a state associated with a first instruction class and reconfiguring the polymorphic execution unit to operate in accordance with the first instruction class, when an instruction of the first instruction class is encountered and the polymorphic execution unit is configured to operate in accordance with a second instruction class. The method also includes the steps of reloading a state associated with a second instruction class and reconfiguring the polymorphic execution unit to operate in accordance with the second instruction class, when an instruction of the second instruction class is encountered and the polymorphic execution unit is configured to operate in accordance with the first instruction class.
摘要:
A technique for performing cache injection includes monitoring addresses on a bus in response to a cache injection instruction. Ownership of input/output data on the bus is acquired by a cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block associated with the cache injection instruction.
摘要:
A method is disclosed in a data processing system for ensuring processing fairness in simultaneous multi-threading (SMT) microprocessors that concurrently execute multiple threads during each clock cycle. A clock cycle priority is assigned to a first thread and to a second thread during a standard selection state that lasts for an expected number of clock cycles. The clock cycle priority is assigned according to a standard selection definition during the standard selection state by selecting the first thread to be a primary thread and the second thread to be a secondary thread during the standard selection state. If a condition exists that requires overriding the standard selection definition, an override state is executed during which the standard selection definition is overridden by selecting the second thread to be the primary thread and the first thread to be the secondary thread.
摘要:
A distributed data processing system includes: (1) a first node with a processor, a first memory, and asynchronous memory mover logic; and connection mechanism that connects (2) a second node having a second memory. The processor includes processing logic for completing a cross-node asynchronous memory move (AMM) operation, wherein the processor performs a move of data in virtual address space from a first effective address to a second effective address, and the asynchronous memory mover logic completes a physical move of the data from a first memory location in the first memory having a first real address to a second memory location in the second memory having a second real address. The data is transmitted via the connection mechanism connecting the two nodes independent of the processor.
摘要:
A data processing system has a processor and a memory coupled to the processor and an asynchronous memory mover coupled to the processor. The asynchronous memory mover has registers for receiving a set of parameters from the processor, which parameters are associated with an asynchronous memory move (AMM) operation initiated by the processor in virtual address space, utilizing a source effective address and a destination effective address. The asynchronous memory mover performs the AMM operation to move the data from a first physical memory location having a source real address corresponding to the source effective address to a second physical memory location having a destination real address corresponding to the destination effective address. The asynchronous memory mover has an associated off-chip translation mechanism. The AMM operation thus occurs independent of the processor, and the processor continues processing other operations independent of the AMM operation.