摘要:
Transaction code written by the programmer may be translated, replaced or transformed into a code that is configured to implement transactions according to any of various techniques. A compiler may replace programmer written transaction code into code allowing multiple compatible transaction implementation techniques to be used in the same program, and at the same time. A programmer may write transaction code once using familiar coding styles, but the transaction to be effected according to one of a number of compatible alternative implementation techniques. The compiler may enable the implementation of multiple, alternative transactional memory schemes. The particular technique implemented for each transaction may not be decided until runtime. At runtime, any of the various implemented techniques may be used to effect the transaction and if a first technique fails or is inappropriate for a particular transaction, one or more other techniques may be attempted.
摘要:
The present invention discloses a method and device for ordering memory operation instructions in an optimizing compiler. for a processor that can potentially enter a stall state if a memory queue is full. The method uses a dependency graph coupled with one or more memory queues. The dependency graph is used to show the dependency relationships between instructions in a program being compiled. After creating the dependency graph, the ready nodes are identified. Dependency graph nodes that correspond to memory operations may have the effect of adding an element to the memory queue or removing one or more elements from the memory queue. The ideal situation is to keep the memory queue as full as possible without exceeding the maximum desirable number of elements, by scheduling memory operations to maximize the parallelism of memory operations while avoiding stalls on the target processor.
摘要:
Apparatus, methods, and computer program products are disclosed that improve the operation of a computer that uses a top-of-stack cache by reducing the number of overflow and underflow traps generated during the execution of a program. The invention maintains a predictor value that controls the number of stack elements that are spilled from, or filled to, the top-of-stack cache in response to an overflow trap or an underflow trap (respectively). The predictor reflects the history of overflow traps and underflow traps.
摘要:
In a set of registers, each individually addressable by register operations using a corresponding register identification, at least one register of the set of registers is an extended register having multiple storage locations. Values stored in the multiple storage locations are accessed, for example, according to the order in which they have been stored. Less than all of the multiple storage locations are accessible by a register operation at a given time. Older versions of software that do not recognize extended registers identify the extended register as having only one storage location. An extended register can be, for example, a stack register, a queue register, or a mixed register and values stored in the multiple storage locations are read and stored according to the characteristics of the register.
摘要:
A system and method are provided for improved handling of data in a cache memory system (105) for caching data transferred between a processor (110) capable of executing a program and a main-memory (115). The cache memory system (105) has at least one cache (135) with several cache-lines (160) capable of caching data therein. In the method, a cache address space is provided for each cache (135) and special instructions are generated and inserted into the program to directly control caching of data in at least one ofthe cache-lines (160). Special instructions received in the cache memory system (105) are then executed to cache the data. The special instructions can be generated by a compiler during compiling of the program. Where the cache memory system (105) includes a set-associative-cache having a number of sets each with several cache-lines (160), the method can further include the step of determining which cache-line in a set to flush to main-memory (115) before caching new data to the set.
摘要:
One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within critical sections of code that are subject to mutual exclusion. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system identifies a critical section within the executable code module by identifying a region of code between a mutual exclusion lock operation and a mutual exclusion unlock operation. The system schedules explicit prefetch instructions into the critical section in advance of associated memory operations. In one embodiment, the system identifies the critical section of code by using a first macro to perform the mutual exclusion lock operation, wherein the first macro additionally activates prefetching. The system also uses a second macro to perform the mutual exclusion unlock operation, wherein the second macro additionally deactivates prefetching.
摘要:
A processor includes a set of registers, each individually addressable using a corresponding register identification, and plural virtual registers, each individually addressable using a corresponding virtual register identification. The processor transfers values between the set of registers and the plural virtual registers under control of a transfer operation. The processor can include a virtual register cache configured to store multiple sets of virtual register values, such that each of the multiple sets of virtual register values corresponds to a different context. Each of the plural virtual registers can include a valid bit that is reset on a context switch and set when a value is loaded from the virtual register cache. The processor can include a virtual register translation look-aside buffer for tracking the location of each set of virtual register values associated with each context.
摘要:
A system and method are provided for efficiently prefetching data in a pointer linked data structure (140). In one embodiment, a data processing system (100) is provided including a processor (110) capable of executing a program, a main-memory (115) and a prefetch engine (175) configured to prefetch data from a plurality of locations in main-memory in response to a prefetch request from the processor. When the data in main-memory (115) has a linked-data-structure having a number nodes (145) each with data (150) stored therein, prefetch engine (175) is configured to traverse the linked-data-structure and prefetch data from the nodes. The prefetch engine (175) is configured to determine from data contained in a prefetched first node (145A) and an offset value a new starting address for a second node (145B) to be prefetched. In one embodiment, the prefetch engine (175) includes a number of sets of prefetch registers (180), one set of prefetch registers for each prefetch request from processor (110) that is yet to be completed. Each set of prefetch registers (180) includes (i) a prefetch address register (190); (ii) an offset register (195); (iii) a termination register (200); (iv) a status register (205); and (v) a returned data register (210).
摘要:
A system that allows a programmer to specify a set of constraints that the programmer has adhered to in writing code so that a compiler is able to assume the set of constraints in disambiguating memory references within the code. The system operates by receiving an identifier for a set of constraints on memory references that the programmer has adhered to in writing the code. The system uses the identifier to select a disambiguation technique from a set of disambiguation techniques. Note that each disambiguation technique is associated with a different set of constraints on memory references. The system uses the selected disambiguation technique to identify memory references within the code that can alias with each other.
摘要:
Additional memory hardware in a computer system which is distinct in function from the main memory system architecture permits the storage and retrieval of prefetch addresses and allows the compiler to more efficiently generate prefetch instructions for execution while traversing pointer-based or recursive data structures. The additional memory hardware makes up a content addressable memory (CAM) or a hash table/array memory that is relatively close in cycle time to the CPU and relatively small when compared to the main memory system. The additional CAM hardware permits the compiler to write data access loops which remember the addresses for each node visited while traversing the linked data structure by providing storage space to hold a prefetch address or a set of prefetch addresses. Since the additional CAM is separate from the main memory system and acts as an alternate cache for holding the prefetch addresses, it prevents the overwriting of desired information in the regular cache and thus leaves the regular cache unpolluted. Furthermore, rather than having the addresses for the entire memory system stored in the CAM, only the addresses to those data nodes traversed along the pointer-based data structure are stored and thus remembered, which allows the size of the CAM to remain relatively small and access to the CAM by the CPU, relatively fast.