Abstract:
Approaches for more efficiently executing calls to native code from within a managed execution environment are described. The techniques involve attempting to execute a native call, such as a call to a C function from within Java code, using a single hardware transaction. Not only is the native code executed in a hardware transaction, but also various transitional operations needed for transitioning between managed execution mode and native execution mode. If the hardware transaction is successful, at least some of the operations that would normally be performed during transitions between modes may be omitted or simplified. If the hardware transaction is unsuccessful, the native calls may be performed as they normally would, outside of hardware transactions.
Abstract:
Transactional reader-writer locks may leverage available hardware transactional memory (HTM) to simplify the procedures of the reader-writer lock algorithm and to eliminate a requirement for type stable memory An HTM-based reader-writer lock may include an ordered list of client-provided nodes, each of which represents a thread that holds (or desires to acquire) the lock, and a tail pointer. The locking and unlocking procedures invoked by readers and writers may access the tail pointer or particular ones of the nodes in the list using various combinations of transactions and non-transactional accesses to insert nodes into the list or to remove nodes from the list. A reader or writer that owns a node at the head of the list (or a reader whose node is preceded in the list only by other readers' nodes) may access a critical section of code or shared resource.
Abstract:
In shared-memory computer systems, threads may communicate with one another using shared memory. A receiving thread may poll a message target location repeatedly to detect the delivery of a message. Such polling may cause excessive cache coherency traffic and/or congestion on various system buses and/or other interconnects. A method for inter-processor communication may reduce such bus traffic by reducing the number of reads performed and/or the number of cache coherency messages necessary to pass messages. The method may include a thread reading the value of a message target location once, and determining that this value has been modified by detecting inter-processor messages, such as cache coherence messages, indicative of such modification. In systems that support transactional memory, a thread may use transactional memory primitives to detect the cache coherence messages. This may be done by starting a transaction, reading the target memory location, and spinning until the transaction is aborted.
Abstract:
A system and method for supporting targeted stores in a shared-memory multiprocessor. A targeted store enables a first processor to push a cache line to be stored in a cache memory of a second processor. This eliminates the need for multiple cache-coherence operations to transfer the cache line from the first processor to the second processor. More specifically, the disclosed embodiments provide a system that notifies a waiting thread when a targeted store is directed to monitored memory locations. During operation, the system receives a targeted store which is directed to a specific cache in a shared-memory multiprocessor system. In response, the system examines a destination address for the targeted store to determine whether the targeted store is directed to a monitored memory location which is being monitored for a thread associated with the specific cache. If so, the system informs the thread about the targeted store.
Abstract:
A computer comprising multiple processors and non-uniform memory implements multiple threads that perform a lock operation using a shared lock structure that includes a pointer to a tail of a first-in-first-out (FIFO) queue of threads waiting to acquire the lock. To acquire the lock, a thread allocates and appends a data structure to the FIFO queue. The lock is released by selecting and notifying a waiting thread to which control is transferred, with the thread selected executing on the same processor socket as the thread controlling the lock. A secondary queue of threads is managed for threads deferred during the selection process and maintained within the data structures of the waiting threads such that no memory is required within the lock structure. If no threads executing on the same processor socket are waiting for the lock, entries in the secondary queue are transferred to the FIFO queue preserving FIFO order.
Abstract:
A data object has a lock and a condition indicator associated with it. Based at least partly on detecting a first setting of the condition indicator, a reader stores an indication that the reader has obtained read access to the data object in an element of a readers structure and reads the data object without acquiring the lock. A writer detects the first setting and replaces it with a second setting, indicating that the lock is to be acquired by readers before reading the data object. Prior to performing a write on the data object, the writer verifies that one or more elements of the readers structure have been cleared.
Abstract:
A computer comprising one or more processors and memory may implement multiple threads that perform a lock operation using a data structure comprising an allocation field and a grant field. Upon entry to a lock operation, a thread allocates a ticket by atomically copying a ticket value contained in the allocation field and incrementing the allocation field. The thread compares the allocated ticket to the grant field. If they are unequal, the thread determines a number of waiting threads. If the number is above the threshold, the thread enters a long term wait operation comprising determining a location for long term wait value and waiting on changes to that value. If the number is below the threshold or the long term wait operation is complete, the thread waits for the grant value to equal the ticket to indicate that the lock is allocated.
Abstract:
Compact and scalable mutual exclusion techniques are implemented for multiple executing threads. A thread may acquire a lock by swapping a pointer to the thread into a tail field of a lock data structure. If the swap operation returned a null value, then the lock is acquired. If the swap operation does not return a null value, then the thread may wait to obtain the lock from a predecessor thread. The thread may wait until a grant field in a data structure for the predecessor thread stores a pointer to the lock, signaling to the thread that the thread may acquire the lock.
Abstract:
A computer comprising one or more processors and memory may implement multiple threads performing mutually exclusive lock acquisition operations on disjoint ranges of a shared resource each using atomic compare and swap (CAS) operations. A linked list of currently locked ranges is maintained and, upon entry to a lock acquisition operation, a thread waits for all locked ranges overlapping the desired range to be released then inserts a descriptor for the desired range into the linked list using a single CAS operation. To release a locked range, a thread executes a single fetch and add (FAA) operation. The operation may be extended to support simultaneous exclusive and non-exclusive access by allowing overlapping ranges to be locked for non-exclusive access and by performing an additional validation after locking to provide conflict resolution should a conflict be detected.
Abstract:
A data object has a lock and a condition indicator associated with it. Based at least partly on detecting a first setting of the condition indicator, a reader stores an indication that the reader has obtained read access to the data object in an element of a readers structure and reads the data object without acquiring the lock. A writer detects the first setting and replaces it with a second setting, indicating that the lock is to be acquired by readers before reading the data object. Prior to performing a write on the data object, the writer verifies that one or more elements of the readers structure have been cleared.