摘要:
The system and methods described herein may be used to implement a scalable, hierarchal, queue-based lock using flat combining. A thread executing on a processor core in a cluster of cores that share a memory may post a request to acquire a shared lock in a node of a publication list for the cluster using a non-atomic operation. A combiner thread may build an ordered (logical) local request queue that includes its own node and nodes of other threads (in the cluster) that include lock requests. The combiner thread may splice the local request queue into a (logical) global request queue for the shared lock as a sub-queue. A thread whose request has been posted in a node that has been combined into a local sub-queue and spliced into the global request queue may spin on a lock ownership indicator in its node until it is granted the shared lock.
摘要:
In traditional transactional locking systems, such as Transactional Locking with Read-Write locks (TLRW), threads may frequently update lock metadata, causing system performance degradation. A system and method for implementing transactional locking using reader-lists (TLRL) may associate a respective reader-list with each stripe of data in a shared memory system. Before reading a given stripe as part of a transaction, a thread may add itself to the stripe's reader-list, if the thread is not already on the reader-list. A thread may leave itself on a reader-list after finishing the transaction. Before a thread modifies a stripe, the modifying thread may acquire a write-lock for the stripe. The writer thread may indicate to each reader thread on the stripe's reader-list that if the reader thread is executing a transaction, the reader thread should abort. The indication may include setting an invalidation flag for the reader. The writer thread may clear the reader-list of a stripe it modified.
摘要:
A computer system includes multiple processing threads that execute in parallel. The multiple processing threads have access to a global environment including different types of metadata enabling the processing threads to carry out simultaneous execution depending on a currently selected type of lock mode. A mode controller monitoring the processing threads initiates switching from one type of lock mode to another depending on current operating conditions such as an amount of contention amongst the multiple processing threads to modify the shared data. The mode controller can switch from one lock mode another regardless of whether any of the multiple processes are in the midst of executing a respective transaction. A most efficient lock mode can be selected to carry out the parallel transactions. In certain cases, switching of lock modes causes one or more of the processing threads to abort and retry a respective transaction according to the new mode.
摘要:
A system and method for concurrency control may use slotted read-write locks. A slotted read-write lock is a lock data structure associated with a shared memory area, wherein the slotted read-write lock indicates whether any thread has a read-lock and/or a write-lock for the shared memory area. Multiple threads may concurrently have the read-lock but only one thread can have the write-lock at any given time. The slotted read-write lock comprises multiple slots, each associated with a single thread. To acquire the slotted read-write lock for reading, a thread assigned to a slot performs a store operation to the slot and then attempts to determine that no other thread holds the slotted read-write lock for writing. To acquire the slotted read-write lock for writing, a thread assigned to a slot sets its write-bit and then attempts to determine that the write-lock is not held.
摘要:
The present disclosure describes a unique way for each of multiple processes to operate in parallel and use the same shared data without causing corruption to the shared data. For example, during a commit phase, a corresponding transaction can attempt to increment a globally accessible version information variable and store a current value of the globally accessible version information variable for updating version information associated with modified data regardless of whether an associated attempt by the corresponding transaction to modify the globally accessible version information variable was successful. As an alternative mode, a corresponding transaction can merely read and store a current value of the globally accessible version information variable without attempting to update the globally accessible version information variable before such use. In yet another application, a parallel processing environment implements a combination of both aforementioned modes depending on a self-abort rate of the transaction.
摘要:
NUMA-aware reader-writer locks may leverage lock cohorting techniques to band together writer requests from a single NUMA node. The locks may relax the order in which the lock schedules the execution of critical sections of code by reader threads and writer threads, allowing lock ownership to remain resident on a single NUMA node for long periods, while also taking advantage of parallelism between reader threads. Threads may contend on node-level structures to get permission to acquire a globally shared reader-writer lock. Writer threads may follow a lock cohorting strategy of passing ownership of the lock in write mode from one thread to a cohort writer thread without releasing the shared lock, while reader threads from multiple NUMA nodes may simultaneously acquire the shared lock in read mode. The reader-writer lock may follow a writer-preference policy, a reader-preference policy or a hybrid policy.
摘要:
The systems and methods described herein may enable a processor core to run at higher speeds than other processor cores in the same package. A thread executing on one processor core may begin waiting for another thread to complete a particular action (e.g., to release a lock). In response to determining that other threads are waiting, the thread/core may enter an inactive state. A data structure may store information indicating which threads are waiting on which other threads. In response to determining that a quorum of threads/cores are in an inactive state, one of the threads/cores may enter a turbo mode in which it executes at a higher speed than the baseline speed for the cores. A thread holding a lock and executing in turbo mode may perform work delegated by waiting threads at the higher speed. A thread may exit the inactive state when the waited-for action is completed.
摘要:
The system and methods described herein may be used to implement NUMA-aware locks that employ lock cohorting. These lock cohorting techniques may reduce the rate of lock migration by relaxing the order in which the lock schedules the execution of critical code sections by various threads, allowing lock ownership to remain resident on a single NUMA node longer than under strict FIFO ordering, thus reducing coherence traffic and improving aggregate performance. A NUMA-aware cohort lock may include a global shared lock that is thread-oblivious, and multiple node-level locks that provide cohort detection. The lock may be constructed from non-NUMA-aware components (e.g., spin-locks or queue locks) that are modified to provide thread-obliviousness and/or cohort detection. Lock ownership may be passed from one thread that holds the lock to another thread executing on the same NUMA node without releasing the global shared lock.
摘要:
The system described herein may track references to a shared object by concurrently executing threads using a reference tracking data structure that includes an owner field and an array of byte-addressable per-thread entries, each including a per-thread reference counter and a per-thread counter lock. Slotted threads assigned to a given array entry may increment or decrement the per-thread reference counter in that entry in response to referencing or dereferencing the shared object. Unslotted threads may increment or decrement a shared unslotted reference counter. A thread may update the data structure and/or examine it to determine whether the number of references to the shared object is zero or non-zero using a blocking-optimistic or a non-blocking mechanism. A checking thread may acquire ownership of the data structure, obtain an instantaneous snapshot of all counters, and return a value indicating whether the number of references to the shared object is zero or non-zero.
摘要:
NUMA-aware reader-writer locks may leverage lock cohorting techniques to band together writer requests from a single NUMA node. The locks may relax the order in which the lock schedules the execution of critical sections of code by reader threads and writer threads, allowing lock ownership to remain resident on a single NUMA node for long periods, while also taking advantage of parallelism between reader threads. Threads may contend on node-level structures to get permission to acquire a globally shared reader-writer lock. Writer threads may follow a lock cohorting strategy of passing ownership of the lock in write mode from one thread to a cohort writer thread without releasing the shared lock, while reader threads from multiple NUMA nodes may simultaneously acquire the shared lock in read mode. The reader-writer lock may follow a writer-preference policy, a reader-preference policy or a hybrid policy.