摘要:
According to one embodiment a system is disclosed. The system includes a central processing unit (CPU), a first cache memory coupled to the CPU to store only data for vital loads that are to be immediately processed at the CPU, a second cache memory coupled to the CPU to store data for semi-vital loads to be processed at the CPU, and a third cache memory coupled to the CPU, the first cache memory and the second cache memory to store non-vital loads to be processed at the CPU.
摘要:
In one embodiment, a method for speculatively reusing regions of code includes identifying a reuse region and a data input to the reuse region, determining whether a data output of the reuse region is contained within reuse region instance information pertaining to a plurality of instances of the reuse region, and when the data output is not contained within the reuse region instance information, predicting the data output of the reuse region based on the reuse region instance information.
摘要:
A method for executing software pipelined executable code generated by compiling a set of unexecutable instructions having an inner loop and an outer loop is disclosed. Instructions are executed that perform the operations specified in the outer loop using a first storage area. A second storage area is allocated for use when performing the operations specified in the inner loop. Instructions are then executed that perform the operations specified in the inner loop using the second storage area, wherein at least certain storage locations in the first storage area are not alterable while the operations specified in the inner loop are being performed.
摘要:
In one implementation of the invention, a computer implemented method used in compiling a program includes identifying a covering load, which may be one of a set of covering loads, and a redundant load. The covering load and the redundant load have a first and second load type, respectively. The first and the second load type each may be one of a group of load types including a regular load and at least one speculative-type load. In one implementation, the group of load types includes at least one check-type load. One implementation of the invention is in a machine readable medium.
摘要:
Logic and instruction to monitor loop trip count are disclosed. Loop trip count information of a loop may be stored in a dedicated hardware buffer. Average loop trip count of the loop may be calculated based on the stored loop trip count information. Based on the average trip count, loop optimizations may be removed from the loop. The stored loop trip count information may include an identifier identifying the loop, a total loop trip count of the loop, and an exit count of the loop.
摘要:
In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed.
摘要:
A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.
摘要:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
摘要:
Example methods and apparatus to manage partial commit-checkpoints are disclosed. A disclosed example method includes identifying a commit instruction associated with a region of instructions executed by a processor, identifying candidate instructions from the region of instructions, and generating a processor partial commit-checkpoint to save a current state of the processor, the checkpoint based on calculated register values associated with live instructions, and including instruction reference addresses to link the candidate instructions.
摘要:
Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.