摘要:
Solutions to a value recycling problem facilitate implementations of computer programs that may execute as multithreaded computations in multiprocessor computers, as well as implementations of related shared data structures. Some exploitations allow non-blocking, shared data structures to be implemented using standard dynamic allocation mechanisms (such as malloc and free). Some exploitations allow non-blocking, indeed even lock-free or wait-free, implementations of dynamic storage allocation for shared data structures. In some exploitations, our techniques provide a way to manage dynamically allocated memory in a non-blocking manner without depending on garbage collection. While exploitations of solutions to the value recycling problem that we propose include management of dynamic storage allocation wherein values managed and recycled tend to include values that encode pointers, they are not limited thereto. Indeed, the techniques are more generally applicable to management of values in a multithreaded computation. For example, value recycling techniques may be exploited, in some cases, apart from dynamic storage allocation, to allow a multithreaded computation to avoid the classic ABA hazard.
摘要:
A hybrid Single-Compare-Single-Store (SCSS) operation may exploit best-effort hardware transactional memory (HTM) for good performance in the case that it succeeds, and may transparently resort to software-mediated transactions if the hardware transactional mechanisms fail. The SCSS operation may compare a value in a control location to a specified expected value, and if they match, may store a new value in a separate data location. The control value may include a global lock, a transaction status indicator, and/or a portion of an ownership record, in different embodiments. If another transaction in progress owns the data location, the SCSS operation may abort the other transaction or may help it complete by copying the other transactions' write set into its own right set before acquiring ownership. A hybrid SCSS operation, which is usually nonblocking, may be applied to building software transactional memories (STMs) and/or hybrid transactional memories (HyTMs), in some embodiments.
摘要:
Many conventional lock-free data structures exploit techniques that are possible only because state-of-the-art 64-bit processors are still running 32-bit operating systems and applications. As software catches up to hardware, “64-bit-clean” lock-free data structures, which cannot use such techniques, are needed. We present several 64-bit-clean lock-free implementations: including load-linked/store conditional variables of arbitrary size, a FIFO queue, and a freelist. In addition to being portable to 64-bit software (or more generally full-architectural-width pointer operations), our implementations also improve on existing techniques in that they are (or can be) space-adaptive and do not require a priori knowledge of the number of threads that will access them.
摘要:
A set of structures and techniques are described herein whereby an exemplary concurrent shared object, namely a shared skip list, can be implemented in a lock-free manner. Indeed, we have developed a number of interesting variants of a lock-free shared skip-list, including variants that may be employed to provide a lock-free shared dictionary. In some variants, a key-value dictionary is implemented.
摘要:
One embodiment of the present invention provides a system that avoids locks by transactionally executing critical sections. During operation, the system receives a program which includes one or more critical sections which are protected by locks. Next, the system modifies the program so that the critical sections which are protected by locks are executed transactionally without acquiring locks associated with the critical sections. More specifically, the program is modified so that: (1) during transactional execution of a critical section, the program first determines if a lock associated with the critical section is held by another process and if so aborts the transactional execution; (2) if the transactional execution of the critical section completes without encountering an interfering data access from another process, the program commits changes made during the transactional execution and optionally resumes normal non-transactional execution of the program past the critical section; and (3) if an interfering data access from another process is encountered during transactional execution of the critical section, the program discards changes made during the transactional execution, and attempts to re-execute the critical section zero or more times.
摘要:
A transactional memory implementation has been developed that is capable of coordinating concurrent hardware transactional memory (HTM) and software transactional memory (STM) transactions over a unified transactional memory space. Some implementations employ hardware transactional memory, if available or suitable, to improve performance. Some exploitations include a hardware transactional memory in which, or for which, hardware-mediated transactions are augmented to include within their transactional scope (or mechanism) one or more additional transactional locations that facilitate coordination with concurrently executing software-mediated transactions (if any).
摘要:
Solutions to a value recycling problem that we define herein facilitate implementations of computer programs that may execute as multithreaded computations in multiprocessor computers, as well as implementations of related shared data structures. Some exploitations of the techniques described herein allow non-blocking, shared data structures to be implemented using standard dynamic allocation mechanisms (such as malloc and free). Indeed, we present several exemplary realizations of dynamic-sized, non-blocking shared data structures that are not prevented from future memory reclamation by thread failures and which depend (in some implementations) only on widely-available hardware support for synchronization. Some exploitations of the techniques described herein allow non-blocking, indeed even lock-free or wait-free, implementations of dynamic storage allocation for shared data structures. A class of general solutions to value recycling is described in the context of an illustration we call the Repeat Offender Problem (ROP), including illustrative Application Program Interfaces (APIs) defined in terms of the ROP terminology. Furthermore, specific solutions, implementations and algorithm, including a Pass-The-Buck (PTB) implementation are described.
摘要:
The design of nonblocking linked data structures using single-location synchronization primitives such as compare-and-swap (CAS) is a complex affair that often requires severe restrictions on the way pointers are used. One way to address this problem is to provide stronger synchronization operations, for example, ones that atomically modify one memory location while simultaneously verifying the contents of others. We provide a simple and highly efficient nonblocking implementation of such an operation: an atomic k-word-compare single-swap operation (KCSS). Our implementation is obstruction-free. As a result, it is highly efficient in the uncontended case and relies on contention management mechanisms in the contended cases. It allows linked data structure manipulation without the complexity and restrictions of other solutions. Additionally, as a building block of some implementations of our techniques, we have developed the first nonblocking software implementation of load-linked/store-conditional that does not severely restrict word size.
摘要:
We teach a powerful approach that greatly simplifies the design of non-blocking mechanisms and data structures, in part by, largely separate the issues of correctness and progress. At a high level, our methodology includes designing an “obstruction-free” implementation of the desired mechanism or data structure, which may then be combined with a contention management mechanism whose role is to facilitate the conditions under which progress of the obstruction-free implementation is assured. In general, the contention management mechanism is separable semantically from an obstruction-free concurrent shared/sharable object implementation to which it is/may be applied. In some cases, the contention management mechanism may actually be coded separately from the obstruction-free implementation. We elaborate herein on the notions of obstruction-freedom and contention management, and various possibilities for combining the two. In addition, we include description of some exemplary applications to particular concurrent software mechanisms and data structure implementations.
摘要:
In shared-memory computer systems, threads may communicate with one another using shared memory. A receiving thread may poll a message target location repeatedly to detect the delivery of a message. Such polling may cause excessive cache coherency traffic and/or congestion on various system buses and/or other interconnects. A method for inter-processor communication may reduce such bus traffic by reducing the number of reads performed and/or the number of cache coherency messages necessary to pass messages. The method may include a thread reading the value of a message target location once, and determining that this value has been modified by detecting inter-processor messages, such as cache coherence messages, indicative of such modification. In systems that support transactional memory, a thread may use transactional memory primitives to detect the cache coherence messages. This may be done by starting a transaction, reading the target memory location, and spinning until the transaction is aborted.