摘要:
Mechanisms and techniques operate in a computerized device to execute critical code without interference from interruptions. Critical code is registered for invocation of a critical execution manager in the event of an interruption to the critical code. The critical code is then executed until an interruption to the critical code occurs. After handling the interruption, a critical execution manager is invoked and the critical execution manager detects if an interference signal indicates a reset value. If the interference signal indicates the reset value, the critical execution manager performs a reset operation on the critical code to reset a current state of the critical code to allow execution of the critical code while avoiding interference from handling the interruption and returns to execution of the critical code using the current state of the critical code.
摘要:
An object-oriented compiler/interpreter allocates monitor records for use in implementing synchronized operations on objects. When a synchronization operation is to be performed on an object, a thread that is to perform the operation “inflates” the object's monitor by placing into its header a pointer to the monitor record as well as an indication of the monitor's inflated status. When a thread is to release its lock on an object, it first consults a reference-count field in the monitor record to determine whether any other threads are synchronized on the object. It then dissociates the object from the monitor record. The dissociation is not atomic with the reference-count check, so the releasing thread checks the reference count again. If that count indicates that further objects had employed the monitor record to synchronize on the object in the interim, then the unlocking thread wakes all waiting threads.
摘要:
A code generating system generates, from code in a program, native code that is executable by a computer system. The code generating system may be included in a just-in-time compiler used to generate native code that is executable by a computer system, from a program in Java Byte Code form, and specifically generates, in response to Java Byte Code representative of a synchronization statement that synchronizes access by multiple threads of execution to at least one variable contained in the Java Byte code, one or more native code instructions that implements a wait-free synchronization methodology to synchronization access to the at least one variable. Since the instructions which implement the wait-free synchronization methodology do not require calls to the operating system, they can generally be processed more rapidly than other synchronization techniques which do require operating system calls.
摘要:
A microprocessor in a computer system processes an instruction stream comprising instructions of a plurality of instruction types including an information retrieval instruction type. The microprocessor comprises a register set, a pending fault flag set, a functional unit, an information retrieval subsystem, and a control subsystem. The register set comprises a plurality of registers, each register for storing information. The pending fault flag set comprises a plurality of pending fault flags each associated with one of said registers, each pending fault flag having selected conditions including a pending fault condition and a no pending fault condition. The functional unit performs processing operations in response to information input thereto. The information retrieval subsystem initiates an information retrieval operation to retrieve of information from said information storage subsystem for storage in a register. The control subsystem controls the other elements of the microprocessor in response to the instructions in the instruction stream. In response to an instruction in the instruction stream of the information retrieval type, the control subsystem enables the information retrieval subsystem to initiate an information retrieval operation, and conditions the pending fault flag associated with said one of said registers to the pending fault condition in response to detection of a fault condition during the information retrieval operation. In response to an instruction in the instruction stream of another type, the control subsystem identifies a selected one of said registers as a source register, and enables information to be transferred from said source register to the functional unit for processing if the pending fault flag associated with said source register is in the no pending fault condition.
摘要:
In transactional memory systems, transactional aborts due to conflicts between concurrent threads may cause system performance degradation. A compiler may attempt to minimize runtime abort rates by performing code transformations and/or other optimizations on a transactional memory program in an attempt to minimize store-commit intervals. The compiler may employ store deferral, hoisting of long-latency operations from within a transaction body and/or store-commit interval, speculative hoisting of long-latency operations, and/or redundant store squashing optimizations. The compiler may perform optimizing transformations on source code and/or on any intermediate representation thereof (e.g., parse trees, un-optimized assembly code, etc.). The compiler may preemptively avoid naïve target code constructions. The compiler may perform static and/or dynamic analysis of a program in order to determine which, if any, transformations should be applied and/or may dynamically recompile code sections at runtime, based on execution analysis.
摘要:
In shared-memory computer systems, threads may communicate with one another using shared memory. A receiving thread may poll a message target location repeatedly to detect the delivery of a message. Such polling may cause excessive cache coherency traffic and/or congestion on various system buses and/or other interconnects. A method for inter-processor communication may reduce such bus traffic by reducing the number of reads performed and/or the number of cache coherency messages necessary to pass messages. The method may include a thread reading the value of a message target location once, and determining that this value has been modified by detecting inter-processor messages, such as cache coherence messages, indicative of such modification. In systems that support transactional memory, a thread may use transactional memory primitives to detect the cache coherence messages. This may be done by starting a transaction, reading the target memory location, and spinning until the transaction is aborted.
摘要:
Hardware-based transactional memory mechanisms, such as Speculative Lock Elision (SLE), may allow multiple threads to concurrently execute critical sections protected by the same lock as speculative transactions. Such transactions may abort due to contention or due to misidentification of code as a critical section. In various embodiments, speculative execution mechanisms may be augmented with software and/or hardware contention management mechanisms to reduce abort rates. Speculative execution hardware may send a hardware interrupt signal to notify software components of a speculative execution event (e.g., abort). Software components may respond by implementing concurrency-throttling mechanisms and/or by determining a mode of execution (e.g., speculative, non-speculative) for a given section and communicating that determination to the hardware speculative execution mechanisms, e.g., by writing it into a lock predictor cache. Subsequently, hardware speculative execution mechanisms may determine a preferred mode of execution for the section by reading the corresponding entry from the lock predictor cache.
摘要:
Multi-threaded, transactional memory systems may allow concurrent execution of critical sections as speculative transactions. These transactions may abort due to contention among threads. Hardware feedback mechanisms may detect information about aborts and provide that information to software, hardware, or hybrid software/hardware contention management mechanisms. For example, they may detect occurrences of transactional aborts or conditions that may result in transactional aborts, and may update local readable registers or other storage entities (e.g., performance counters) with relevant contention information. This information may include identifying data (e.g., information outlining abort relationships between the processor and other specific physical or logical processors) and/or tallied data (e.g., values of event counters reflecting the number of aborted attempts by the current thread or the resources consumed by those attempts). This contention information may be accessible by contention management mechanisms to inform contention management decisions (e.g. whether to revert transactions to mutual exclusion, delay retries, etc.).
摘要:
A lock-clustering compiler is configured to compile program code for a software transactional memory system. The compiler determines that a group of data structures are accessed together within one or more atomic memory transactions defined in the program code. In response to determining that the group is accessed together, the compiler creates an executable version of the program code that includes clustering code, which is executable to associate the data structures of the group with the same software transactional memory lock. The lock is usable by the software transactional memory system to coordinate concurrent transactional access to the group of data structures by multiple concurrent threads.
摘要:
The system and methods described herein may be used to implement NUMA-aware locks that employ lock cohorting. These lock cohorting techniques may reduce the rate of lock migration by relaxing the order in which the lock schedules the execution of critical code sections by various threads, allowing lock ownership to remain resident on a single NUMA node longer than under strict FIFO ordering, thus reducing coherence traffic and improving aggregate performance. A NUMA-aware cohort lock may include a global shared lock that is thread-oblivious, and multiple node-level locks that provide cohort detection. The lock may be constructed from non-NUMA-aware components (e.g., spin-locks or queue locks) that are modified to provide thread-obliviousness and/or cohort detection. Lock ownership may be passed from one thread that holds the lock to another thread executing on the same NUMA node without releasing the global shared lock.