Abstract:
A lightweight, concurrent detection mechanism avoids global thread suspension by operating during runtime with threads under examination. A particular configuration combines a dependency (“waits for”) snapshot with a progression check to determine advancement of purportedly deadlocked threads. Thread blocking is enumerated in a table or graph which denotes dependencies of threads and the corresponding resources. For identified circular dependencies, a successive transition, or progression check ratifies the potential deadlock. A transition counter corresponding to each thread is analyzed in the progression check. The transition counter is indicative of a change in state for the process in question, hence is indicative of instruction execution, an activity not performed by a blocked process. Deadlock is therefore ratified if the transition counters associated with the threads in the potential deadlock have not advanced.
Abstract:
A technique provides a remote serialization guarantee within a computerized system. The technique involves (i) receiving a serialization command from a first thread running on a first processor of the computerized system; (ii) running, on a second processor, a second thread up to a serialization point; and (iii) outputting a serialization result to the first thread in response to the serialization command. The serialization result indicates that the second thread has run up to the serialization point. Such operation enables the first and second threads to robustly coordinate access to a shared resource by the first thread incurring both the burden of employing a MEMBAR instruction and the burden of providing the remote serialization command when attempting to access the shared resource, and the second thread not running any MEMBAR instruction when attempting to access the shared resource to enable the second thread to run more efficiently.
Abstract:
Apparatus, methods, and program products are disclosed that provide a technology that implicitly isolates a portion of a transactional memory that is shared between multiple threads for exclusive use by an isolating thread without the possibility of other transactions modifying the isolated portion of the transactional memory.
Abstract:
Mechanisms and techniques operate in a computerized device to enable or disable speculative execution of instructions such as reordering of load and store instructions a multiprocessing computerized device. The mechanisms and techniques provide a speculative execution controller that can detect a multiaccess memory condition between the first and second processors, such as concurrent access to shared data pages via page table entries. This can be done by monitoring page table entry accesses by other processors. The speculative execution controller sets a value of a speculation indicator in the memory system based on the multiaccess memory condition. If the value of the speculation indicator indicates that speculative execution of instructions is allowed in the computerized device, the speculative execution controller allows speculative execution of instructions in at least one of the first and second processors in the computerized device. If the value of the speculation indicator indicates that speculative execution of instructions is not allowed in the computerized device, the speculative execution controller does not allow speculative execution of instructions.
Abstract:
Mechanisms and techniques operate in a computerized device to enable or disable speculative execution of instructions such as load instructions on one or more processors in the computerized device. The mechanisms and techniques can execute a set of instructions on a processor in the computerized device and can detect a value of a speculation indicator. If the value of the speculation indicator indicates that speculative execution of load instructions is allowed in the computerized device, the mechanisms and techniques allow speculative execution of load instructions in the processor, whereas if the value of the speculation indicator indicates that speculative execution of load instructions is not allowed in the computerized device, the mechanisms and techniques do not allow speculative execution of load instructions in the processor. An instruction in code can turn on and off the speculation indicator, which can be one or more bits in a control register or in page table entries associated with pages of memory. Under certain conditions, speculative execution correction mechanisms can be enabled, disabled or removed from a processor.
Abstract:
NUMA-aware reader-writer locks may leverage lock cohorting techniques to band together writer requests from a single NUMA node. The locks may relax the order in which the lock schedules the execution of critical sections of code by reader threads and writer threads, allowing lock ownership to remain resident on a single NUMA node for long periods, while also taking advantage of parallelism between reader threads. Threads may contend on node-level structures to get permission to acquire a globally shared reader-writer lock. Writer threads may follow a lock cohorting strategy of passing ownership of the lock in write mode from one thread to a cohort writer thread without releasing the shared lock, while reader threads from multiple NUMA nodes may simultaneously acquire the shared lock in read mode. The reader-writer lock may follow a writer-preference policy, a reader-preference policy or a hybrid policy.
Abstract:
Transactional Lock Elision (TLE) may allow multiple threads to concurrently execute critical sections as speculative transactions. Transactions may abort due to various reasons. To avoid starvation, transactions may revert to execution using mutual exclusion when transactional execution fails. Because threads may revert to mutual exclusion in response to the mutual exclusion of other threads, a positive feedback loop may form in times of high congestion, causing a “lemming effect”. To regain the benefits of concurrent transactional execution, the system may allow one or more threads awaiting a given lock to be released from the wait queue and instead attempt transactional execution. A gang release may allow a subset of waiting threads to be released simultaneously. The subset may be chosen dependent on the number of waiting threads, historical abort relationships between threads, analysis of transactions of each thread, sensitivity of each thread to abort, and/or other thread-local or global criteria.
Abstract:
The systems and methods described herein may enable a processor core to run at higher speeds than other processor cores in the same package. A thread executing on one processor core may begin waiting for another thread to complete a particular action (e.g., to release a lock). In response to determining that other threads are waiting, the thread/core may enter an inactive state. A data structure may store information indicating which threads are waiting on which other threads. In response to determining that a quorum of threads/cores are in an inactive state, one of the threads/cores may enter a turbo mode in which it executes at a higher speed than the baseline speed for the cores. A thread holding a lock and executing in turbo mode may perform work delegated by waiting threads at the higher speed. A thread may exit the inactive state when the waited-for action is completed.
Abstract:
The system and methods described herein may be used to implement NUMA-aware locks that employ lock cohorting. These lock cohorting techniques may reduce the rate of lock migration by relaxing the order in which the lock schedules the execution of critical code sections by various threads, allowing lock ownership to remain resident on a single NUMA node longer than under strict FIFO ordering, thus reducing coherence traffic and improving aggregate performance. A NUMA-aware cohort lock may include a global shared lock that is thread-oblivious, and multiple node-level locks that provide cohort detection. The lock may be constructed from non-NUMA-aware components (e.g., spin-locks or queue locks) that are modified to provide thread-obliviousness and/or cohort detection. Lock ownership may be passed from one thread that holds the lock to another thread executing on the same NUMA node without releasing the global shared lock.
Abstract:
A method, system, and medium are disclosed for facilitating communication between multiple concurrent threads of execution using a multi-lane concurrent bag. The bag comprises a plurality of independently-accessible concurrent intermediaries (lanes) that are each configured to store data elements. The bag provides an insert function executable to insert a given data element into the bag by selecting one of the intermediaries and inserting the data element into the selected intermediary. The bag also provides a consume function executable to consume a data element from the bag by choosing one of the intermediaries and consuming (removing and returning) a data element stored in the chosen intermediary. The bag guarantees that execution of the consume function consumes a data element if the bag is non-empty and permits multiple threads to execute the insert or consume functions concurrently.