摘要:
Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
摘要:
A method of assigning virtual memory to physical memory in a data processing system allocates a set of contiguous physical memory pages for a new page mapping, instructs the memory controller to move the virtual memory pages according to the new page mapping, and then allows access to the virtual memory pages using the new page mapping while the memory controller is still copying the virtual memory pages to the set of physical memory pages. The memory controller can use a mapping table which temporarily stores entries of the old and new page addresses, and releases the entries as copying for each entry is completed. The translation look aside buffer (TLB) entries in the processor cores are updated for the new page addresses prior to completion of copying of the memory pages by the memory controller. The invention can be extended to non-uniform memory array (NUMA) systems. For systems with cache memory, any cache entry which is affected by the page move can be updated by modifying its address tag according to the new page mapping. This tag modification may be limited to cache entries in a dirty coherency state. The cache can further relocate a cache entry based on a changed congruence class for any modified address tag.
摘要:
A method of assigning virtual memory to physical memory in a data processing system allocates a set of contiguous physical memory pages for a new page mapping, instructs the memory controller to move the virtual memory pages according to the new page mapping, and then allows access to the virtual memory pages using the new page mapping while the memory controller is still copying the virtual memory pages to the set of physical memory pages. The memory controller can use a mapping table which temporarily stores entries of the old and new page addresses, and releases the entries as copying for each entry is completed. The translation lookaside buffer (TLB) entries in the processor cores are updated for the new page addresses prior to completion of copying of the memory pages by the memory controller. The invention can be extended to non-uniform memory array (NUMA) systems. For systems with cache memory, any cache entry which is affected by the page move can be updated by modifying its address tag according to the new page mapping. This tag modification may be limited to cache entries in a dirty coherency state. The cache can further relocate a cache entry based on a changed congruence class for any modified address tag.
摘要:
A method and system for a compression scheme used with program executables that run in a reduced instruction set computer (RISC) architecture such as the PowerPC is disclosed. Initially, a RISC instruction set is expanded to produce code that facilitates the removal of redundant fields. The program is then rewritten using this new expanded instruction set. Next, a filter is applied to remove redundant fields from the expanded instructions. The expanded instructions are then clustered into groups, such that instructions belonging to the same cluster show similar bit patterns. Within each cluster, the scopes are created such that register usage patterns within each scope are similar. Within each cluster, more scopes are created such that literals within each instruction scope are drawn from the same range of integers. A conventional compression technique such as Huffman encoding is then applied on each instruction scope within each cluster. Dynamic programming techniques are then used to produce the best combination of encoding among all scopes within all the different clusters. Where applicable, instruction scopes are combined that use the same encoding scheme to reduce the size of the resulting dictionary. Similarly instruction clusters are combined that use the same encoding scheme to reduce the size of the resulting dictionary.
摘要:
A method and system for compressing memory address traces based on detecting and reducing the loops that exist in a trace is disclosed. The method and system consists of two steps. In the first step, the trace is analyzed and loops are detected by determining the control flow among the program basic blocks. In the second step, each loop is analyzed to eliminate constant address references, and to apply compiler-like strength reduction on addresses that differ only by a fixed offset between consecutive loop iterations. Addresses that cannot be eliminated using the method and system of the present invention are kept in the trace.
摘要:
A compiler for incorporating error detection into executable code generates conventional assembler language object code from a source code file. The compiler identifies an error detection segment (EDS) in the assembler code, where the EDS includes a subset of basic blocks in the assembler code. The compiler also identifies register and memory references in the EDS and inserts a set of instructions into the EDS. The inserted instructions record an entry state and an exit state of the referenced registers and memory locations. The state information is stored in a checkpoint portion of system memory. The compiler may generate shadow EDS code including instructions mirroring the instructions in the main EDS and verifying instructions that compare results produced by the mirroring instructions with results produced by the main EDS. The shadow EDS initiates an error recovery process if results produced by the shadow EDS and the main EDS differ.
摘要:
A method and apparatus for providing remote access redirect in a host channel adapter of a system area network are provided. The apparatus and method provide a mechanism by which a host channel adapter, in response to receiving a marker message, places selected channel(s) of the host channel adapter in a remote access redirect (RAR) mode of operation. During the RAR mode of operation, memory access messages received by the host channel adapter that are destined for portions of an application memory space marked as being protected are converted to RAR receive messages and redirected to a queue pair associated with an operating system rather than the queue pair for the application. The operating system is responsible for serializing access to application memory pages outside of the host channel adapter. The mechanisms of the present invention may be used to perform a checkpoint data integrity operation.
摘要:
In accordance with a method and system of the present invention, a compression scheme for program executables is disclosed. First, instruction clustering starts by placing each instruction in a cluster by itself. The method and system then compute in an iterative fashion the distance between clusters, and merge the nearest clusters to form larger clusters. Therefore, instructions are clustered into groups, such that instructions belonging to the same cluster show similar bit patterns. This process stops when the number of clusters reaches a pre-specified goal. This goal is defined empirically, and may be adjusted if better compression can result. After all clusters have been defined, a suitable compressor is applied to each cluster to produce the compressed executable.
摘要:
A compression scheme for program executables that run in a reduced instruction set computer (RISC) architecture such as the PowerPC is disclosed. The method and system utilize scope-based compression for increasing the effectiveness of conventional compression with respect to register and literal encoding. First, discernible patterns are determined by exploiting instruction semantics and conventions that compilers adopt in register and literal usage. Additional conventions may also be set for register usage to facilitate compression. Using this information, separate scopes are created such that in each scope there is a more prevalent usage of a limited set of registers or literal value ranges, or there is an easily discernible pattern of register or literal usage. Each scope then is compressed separately by a conventional compressor. The resulting code is more compact because the small number of registers and literals in each scope makes the encoding sparser than when the compressor operates on the global scope that includes all instructions in a program. Additionally, scope-based compression reveals more frequent patterns within each scope than when considering the entire instruction stream as an opaque stream of bits.
摘要:
A compression scheme is disclosed for program executables that run on Reduced Instruction Set Computer (RISC) processors, such as the PowerPC architecture. The RISC instruction set is expanded by adding opcodes to produce code that facilitates the removal of redundant fields. To compress a program, a compressor engine rewrites the executable using the new expanded instruction set. Next, a filter is applied to remove the redundant fields from the expanded instructions. A conventional compression technique such as Huffman encoding is then applied on the resulting code.