摘要:
A method, an apparatus, and a computer program are provided for modeling algorithm for power consumption with improved accuracy. The improved model allows for variable operation of Local Clock Buffers (LCBs), which consume significant amounts power. Also, the amount of power consumed by each LCB can be varied. These variations of operation of LCBs and power consumption by LCBs allows for a more realistic model of operation of a given circuit or macro instead of the traditional “worst-case scenario” of power consumption.
摘要:
Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.
摘要:
Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.
摘要:
Mechanisms are provided for arranging binary code to reduce instruction cache conflict misses. These mechanisms generate a call graph of a portion of code. Nodes and edges in the call graph are weighted to generate a weighted call graph. The weighted call graph is then partitioned according to the weights, affinities between nodes of the call graph, and the size of cache lines in an instruction cache of the data processing system, so that binary code associated with one or more subsets of nodes in the call graph are combined into individual cache lines based on the partitioning. The binary code corresponding to the partitioned call graph is then output for execution in a computing device.
摘要:
Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system.
摘要:
A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location.
摘要:
An apparatus, a method, and a processor are provided for recovering the correct state of processor instructions in a processor. This apparatus contains a pipeline of latches, a register file, and a replay loop. The replay loop repairs incorrect results and inserts the repaired results back into the pipeline. A state machine detects incorrect results within the pipeline and sends the incorrect results to the replay loop. A correction module on the replay loop repairs the incorrect results and transmits the repaired results back into the pipeline. When an incorrect result enters the replay loop, a flush operation: ceases other operations within the pipeline; flushes the rest of the data results in the pipeline to the replay loop; opens the pipeline for the repaired results to be inserted; and eliminates any operations within the processor that would utilize the incorrect results.
摘要:
A system and method that maintains a relatively small Instruction Load Buffer (ILB) is maintained for scheduling instructions. Instructions are sent from Local Store (LS) to the ILB using either an inline prefetcher or a branch table buffer loader. In one embodiment, the prefetcher is a hardware-based prefetcher that fetches, in address order, the next instructions likely to be scheduled. In one embodiment, the predicted branch instructions are loaded as a result of a software program, such as a dispatcher, issuing a “load branch table buffer (loadbtb)” instruction. Predicted branch instructions are loaded in one area of the ILB and inline instructions are loaded in another area of the ILB. In one embodiment, the loadbtb loads the instruction line that includes the predicted branch target address as well as the instruction line that immediately follows the instruction line with the predicted branch target address.
摘要:
A processor system executes multiple applet programs within a software application program in an information handling system. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In particular, the operating system software manages partitioning of a register file in the processor system to achieve a cooperative relationship among multiple applet programs within respective partitions of the register file. In one embodiment, the operating system software manages unique applet ID's to modify register file partition sizes and locations during applet program instruction text execution. In one embodiment, applet ID masking hardware provides sharing of register file space among multiple copies of applet program code.
摘要:
Mechanisms for performing decoding of context-adaptive binary arithmetic coding (CABAC) encoded data. The mechanisms receive, in a first single instruction multiple data (SIMD) vector register of the data processing system, CABAC encoded data of a bit stream. The CABAC encoded data includes a value to be decoded and bit stream state information. The mechanisms receive, in a second SIMD vector register of the data processing system, CABAC decoder context information. The mechanisms process the value, the bit stream state information, and the CABAC decoder context information in a non-recursive manner to generate a decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms store, in a third SIMD vector register, a result vector that combines the decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms use the decoded value to generate a video output on the data processing system.