摘要:
A test system or simulator includes an enhanced IC test application sampling software program that executes test application software on a semiconductor die IC design model. The enhanced test application sampling software may include trace, simulation point, CPI error, clustering, instruction budgeting, and other programs. The enhanced test application sampling software generates basic block vectors (BBVs) and fly-by vectors (FBVs) from instruction trace analysis of test application software workloads. The enhanced test application sampling software utilizes the microarchitecture dependent information to generate the FBVs to select representative instruction intervals from the test application software. The enhanced test application sampling software generates a reduced representative test application software program from the BBV and FBV data utilizing a global instruction budgeting analysis method. Designers use the test system with enhanced test application sampling software to evaluate IC design models by using the representative test application software program.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
Techniques for preserving memory affinity in a computer system is disclosed. In response to a request for memory access to a page within a memory affinity domain, a determination is made if the request is initiated by a processor associated with the memory affinity domain. If the request is not initiated by a processor associated with the memory affinity domain, a determination is made if there is a page ID match with an entry within a page migration tracking module associated with the memory affinity domain. If there is no page ID match, an entry is selected within the page migration tracking module to be updated with a new page ID and a new memory affinity ID. If there is a page ID match, then another determination is made whether or not there is a memory affinity ID match with the entry with the page ID field match. If there is no memory affinity ID match, the entry is updated with a new memory affinity ID; and if there is a memory affinity ID match, an access counter of the entry is incremented.
摘要:
A performance projection system includes a test IHS and a currently existing IHS. The performance projection system includes surrogate programs and user application software. The test IHS employs a memory that includes a virtual future IHS, currently existing IHS, surrogate programs, and user application software for determination of runtime and HW counter performance data. The user application software and surrogate programs execute on the currently existing MS to provide designers with runtime data and HW counter or microarchitecture dependent data. Designers execute surrogate programs on the future IHS to provide runtime and HW counter data. Designers normalize and weight the runtime and HW counter data to provide a representative surrogate program for comparison to user application software performance on the future IHS. Using a scaling factor, designers may generate a projection of runtime performance for the user application software executing on the future IHS.
摘要:
A cache within a computer system receives a partial write request and identifies a cache hit of a cache line. The cache line corresponds to the partial write request and includes existing data. In turn, the cache receives partial write data and merges the partial write data with the existing data into the cache line. In one embodiment, the existing data is “modified” or “dirty.” In another embodiment, the existing data is “shared.” In this embodiment, the cache changes the state of the cache line to indicate the storing of the partial write data into the cache line.
摘要:
A method for minimizing cache conflict misses is disclosed. A translation table capable of facilitating the translation of a virtual address to a real address during a cache access is provided. The translation table includes multiple entries, and each entry of the translation table includes a page number field and a hash value field. A hash value is generated from a first group of bits within a virtual address, and the hash value is stored in the hash value field of an entry within the translation table. In response to a match on the entry within the translation table during a cache access, the hash value of the matched entry is retrieved from the translation table, and the hash value is concatenated with a second group of bits within the virtual address to form a set of indexing bits to index into a cache set.
摘要:
A processor resource manager assigns a branch history resource to a first execution mode. The branch history resource is utilized for predicting a branch direction of a branch instruction. Next, the resource manager logs a number of branch mispredictions that occur while the processor executes a second execution mode. The resource manager, in turn, reassigns the branch history resource to the second execution mode based upon the number of branch mispredictions.
摘要:
A job scheduler can select a processor core operating frequency for a node in a cluster to perform a job based on energy usage and performance data. After a job request is received, an energy aware job scheduler accesses data that specifies energy usage and job performance metrics that correspond to the requested job and a plurality of processor core operating frequencies. A first of the plurality of processor core operating frequencies is selected that satisfies an energy usage criterion for performing the job based, at least in part, on the data that specifies energy usage and job performance metrics that correspond to the job. The job is assigned to be performed by a node in the cluster at the selected first of the plurality of processor core operating frequencies.