摘要:
A method and apparatus for executing multithreaded algorithm to provide static timing analysis of a chip design includes analyzing a chip design to identify various components and nodes associated with the components. A node tree is built with a plurality of nodes. The node tree identifies groups of nodes that are available in different levels. A size of node grouping for a current level is determined by looking up the node tree. Testing data for parallel processing of different size of node groupings using varied thread counts is compiled. An optimum thread count for the current level based on the size of node grouping in the node tree is identified from compiled testing data. Dynamic parallel processing of nodes in the current level is performed using the number of threads identified by the optimum thread count. An acceptable design of the chip is determined by the dynamic parallel processing.
摘要:
A method is provided for obtaining data to be used in evaluating performance of a computer processor. More specifically, the method provides for efficiently obtaining traces from an application program for use in a simulation of a computer processor. The method uses both an original code defining the application program and an instrumented version of the original code (“instrumented code”). The method includes apportioning a total time of execution of the application program between the original code and the instrumented code. Transition of execution between the original and instrumented codes is conducted through either modification of function calls or through consultation with a mapping of instruction address correspondences between the original and instrumented codes.
摘要:
A processor, method, and medium for using vector operations to compress selected elements of a vector. An input vector is compared to a criteria vector, and then a subset of the plurality of elements of the input vector are selected based on the comparison. A permutation vector is generated based on the locations of the selected elements and then the permutation vector is used to permute the selected elements of the input vector to an output vector. The selected elements of the input vector are stored in contiguous locations in the leftmost elements of the output vector. Then, the output vector is stored to memory and a pointer to the memory location is incremented by the number of selected elements.
摘要:
A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.
摘要:
A processor configured to synchronize threads in multithreaded applications. The processor includes first and second registers. The processor stores a first bitmask in the first register and a second bitmask in the second register. For each bitmask, each bit corresponds with one of multiple threads. A given bit in the first bitmask indicates the corresponding thread has been assigned to execute a portion of a unit of work. A corresponding bit in the second bitmask indicates the corresponding thread has completed execution of its assigned portion of the unit of work. The processor receives updates to the second bitmask in the second register and provides an indication that the unit of work has been completed in response to detecting that for each bit in the first bitmask that corresponds to a thread that is assigned work, a corresponding bit in the second bitmask indicates its corresponding thread has completed its assigned work.
摘要:
Systems and methods for maximizing a number of available states for a version number used for memory corruption detection. A physical memory may be a DRAM comprising a plurality of regions. Version numbers associated with data structures allocated in the physical memory may be generated so that version numbers of adjacent data structures in a virtual address space are different. A reserved set and an available set of version numbers are associated with each one of the plurality of regions. A version number in a reserved set of a given region may be in an available set of another region. The processor detects no memory corruption error in response to at least determining a version number stored in a memory location in a first region identified by a memory access operation is also in a reserved set associated with the first region.
摘要:
A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.
摘要:
One embodiment of the present invention provides a system for using register readiness to facilitate value prediction. The system starts by loading a previously computed result for a function to a destination register for the function from a lookup table. The system then checks the destination register for the function by using a Branch-Register-Not-Ready (BRNR) instruction to check the readiness of the destination register. If the destination register is ready, the system uses the previously computed result in the destination register as the result of the function. Loading the value from the lookup table in this way avoids unnecessarily calculating the result of the function when that result has previously been computed.
摘要:
Predicting prefetch data sources for runahead execution triggering read operations eliminates the latency penalties of missing read operations that typically are not addressed by runahead execution mechanisms. Read operations that most likely trigger runahead execution are identified. The code unit that includes those triggering read operations is modified so that the code unit branches to a prefetch predictor. The prefetch predictor observes sequence patterns of data sources of triggering read operations and develops prefetch predictions based on the observed data source sequence patterns. After a prefetch prediction gains reliability, the prefetch predictor supplies a predicted data source to a prefetcher coincident with triggering of runahead execution.
摘要:
Systems and methods for providing additional instructions for supporting efficient memory corruption detection in a processor. A physical memory may be a DRAM with a spare bank of memory reserved for a hardware failover mechanism. Version numbers associated with data structures allocated in the memory may be generated so that version numbers of adjacent data structures are different. A processor determines that a fetched instruction is a memory access instruction corresponding to a first data structure within the memory. For instructions that are not a version update instruction, the processor compares the first version number and second version number stored in a location in the memory indicated by the generated address and flags an error if there is a mismatch. For version update instructions, the processor performs a memory access operation on the second version number with no comparison check.