摘要:
We present a methodology for transforming concurrent data structure implementations that depend on garbage collection to equivalent implementations that do not. Assuming the existence of garbage collection makes it easier to design implementations of concurrent data structures, particularly because it eliminates the well-known ABA problem. However, this assumption limits their applicability. Our results demonstrate that, for a significant class of data structures, designers can first tackle the easier problem of an implementation that does depend on garbage collection, and then apply our methodology to achieve a garbage-collection-independent implementation. Our methodology is based on the well-known reference counting technique, and employs the double compare-and-swap operation.
摘要:
The systems and methods described herein may be used to implement a shared dynamic-sized data structure using hardware transactional memory to simplify and/or improve memory management of the data structure. An application (or thread thereof) may indicate (or register) the intended use of an element of the data structure and may initialize the value of the data structure element. Thereafter, another thread or application may use hardware transactions to access the data structure element while confirming that the data structure element is still part of the dynamic data structure and/or that memory allocated to the data structure element has not been freed. Various indicators may be used determine whether memory allocated to the element can be freed.
摘要:
The transactional memory system described herein may apply a mix of read validation techniques to validate read operations (e.g., invisible reads and/or semi-visible reads) in different transactions, or to validate different read operations within a single transaction (including reads of the same location). The system may include mechanisms to dynamically determine that a read validation technique should be replaced by a different technique for reads of particular locations or for all subsequent reads, and/or to dynamically adjust the balance between different read validation techniques to manage costs. Some of the read validation techniques may be supported by hardware transactional memory (HTM). The system may delay acquisition of ownership records for reading, and may acquire two or more ownership records back-to-back (e.g., within a single hardware transaction). The user code of a software transaction may be divided into multiple segments, some of which may be executed within a hardware transaction.
摘要:
The system and methods described herein may reduce read/write fence latencies and cache pressure related to STM metadata accesses. These techniques may leverage locality information (as reflected by the value of a respective locale guard) associated with each of a plurality of data partitions (locales) in a shared memory to elide various operations in transactional read/write fences when transactions access data in locales owned by their threads. The locale state may be disabled, free, exclusive, or shared. For a given memory access operation of an atomic transaction targeting an object in the shared memory, the system may implement the memory access operation using a contention mediation mechanism selected based on the value of the locale guard associated with the locale in which the target object resides. For example, a traditional read/write fence may be employed in some memory access operations, while other access operations may employ an optimized read/write fence.
摘要:
We explore techniques for designing nonblocking algorithms that do not require advance knowledge of the number of processes that participate, whose time complexity and space consumption both adapt to various measures, rather than being based on predefined worst-case scenarios, and that cannot be prevented from future memory reclamation by process failures. These techniques can be implemented using widely available hardware synchronization primitives. We present our techniques in the context of solutions to the well-known Collect problem. We also explain how our techniques can be exploited to achieve other results with similar properties; these include long-lived renaming and dynamic memory management for nonblocking data structures.
摘要:
The design of nonblocking linked data structures using single-location synchronization primitives such as compare-and-swap (CAS) is a complex affair that often requires severe restrictions on the way pointers are used. One way to address this problem is to provide stronger synchronization operations, for example, ones that atomically modify one memory location while simultaneously verifying the contents of others. We provide a simple and highly efficient nonblocking implementation of such an operation: an atomic k-word-compare single-swap operation (KCSS). Our implementation is obstruction-free. As a result, it is highly efficient in the uncontended case and relies on contention management mechanisms in the contended cases. It allows linked data structure manipulation without the complexity and restrictions of other solutions. Additionally, as a building block of some implementations of our techniques, we have developed the first nonblocking software implementation of load-linked/store-conditional that does not severely restrict word size.
摘要:
A methodology has been discovered for transforming garbage collection-dependent algorithms, shared object implementations and/or concurrent software mechanisms into a form that does not presume the existence of an independent, or execution environment provided, garbage collector. Algorithms, shared object implementations and/or mechanisms designed or transformed using techniques described herein provide explicit reclamation of storage using lock-free pointer operations. Transformations can be applied to lock-free algorithms and shared object implementations and preserve lock-freedom of such algorithms and implementations. As a result, existing and future lock-free algorithms and shared object implementations that depend on a garbage-collected execution environment can be exploited in environments that do not provide garbage collection. Furthermore, algorithms and shared object implementations that employ explicit reclamation of storage using lock-free pointer operations such as described herein may be employed in the implementation of a garbage collector itself.
摘要:
A method for inserting an object into a concurrent set including obtaining a key associated with the object, traversing the concurrent set using a first thread containing the key, identifying a first insertion point while traversing the concurrent set, where the first insertion point is before a current node and after a predecessor node, obtaining a first lock for the predecessor node after identifying the first insertion point, validating the predecessor node and the current node after obtaining the lock, inserting a new node into the concurrent set after validating, where the new node is associated with the object, and releasing the first lock after inserting the new node.
摘要:
Transactional programming promises to substantially simplify the development and maintenance of correct, scalable, and efficient concurrent programs. Designs for supporting transactional programming using transactional memory implemented in hardware, software, and a mixture of the two have emerged recently. However, certain features and capabilities that would be desirable for debugging programs executed using transactional memory are absent from conventional debuggers. Breakpointing is one example of a capability not well supported when conventional debugging technology is applied to transactional memory. We describe techniques by which a debugger may instrument code (or by which a TM library may provide functionality) to direct execution of an atomic block to a code path that facilitates breakpoint handling.
摘要:
We propose a class of mechanisms to support a new style of synchronization that offers simple and efficient solutions to several existing problems for which existing solutions are complicated, expensive, and/or otherwise inadequate. In general, the proposed mechanisms allow a program to read from a first memory location (called the “flagged” location), and to then continue execution, storing values to zero or more other memory locations such that these stores take effect (i.e., become visible in the memory system) only while the flagged memory location does not change. In some embodiments, the mechanisms further allow the program to determine when the first memory location has changed. We call the proposed mechanisms conditional multi-store synchronization mechanisms.