Abstract:
A system and method for transactional memory using read-write locks is disclosed. Each of a plurality of shared memory areas is associated with a respective read-write lock, which includes a read-lock portion indicating whether any thread has a read-lock for read-only access to the memory area and a write-lock portion indicating whether any thread has a write-lock for write access to the memory area. A thread executing a group of memory access operations as an atomic transaction acquires the proper read or write permissions before performing a memory operation. To perform a read access, the thread attempts to obtain the corresponding read-lock and succeeds if no other thread holds a write-lock for the memory area. To perform a write-access, the thread attempts to obtain the corresponding write-lock and succeeds if no other thread holds a write-lock or read-lock for the memory area.
Abstract:
A Scalable NonZero Indicator (SNZI) object in a concurrent computing application may include a shared data portion (e.g., a counter portion) and a shared nonzero indicator portion, and/or may be an element in a hierarchy of SNZI objects that filters changes in non-root nodes to a root node. SNZI objects may be accessed by software applications through an API that includes a query operation to return the value of the nonzero indicator, and arrive (increment) and depart (decrement) operations. Modifications of the data portion and/or the indicator portion may be performed using atomic read-modify-write type operations. Some SNZI objects may support a reset operation. A shared data object may be set to an intermediate value, or an announce bit may be set, to indicate that a modification is in progress that affects its corresponding indicator value. Another process or thread seeing this indication may “help” complete the modification before proceeding.
Abstract:
The system and methods described herein may be used to detect and tolerate atomicity violations between concurrent code blocks and/or to generate code that is executable to detect and tolerate such violations. A compiler may transform program code in which the potential for atomicity violations exists into alternate code that tolerates these potential violations. For example, the compiler may inflate critical sections, transform non-critical sections into critical sections, or coalesce multiple critical sections into a single critical section. The techniques described herein may utilize an auxiliary lock state for locks on critical sections to enable detection of atomicity violations in program code by enabling the system to distinguish between program points at which lock acquisition and release operations appeared in the original program, and the points at which these operations actually occur when executing the transformed program code. Filtering and analysis techniques may reduce false positives induced by the transformations.
Abstract:
A method, apparatus and computer program product for providing page-protection based memory access barrier traps is presented. A value for a user-mode bit (u-bit) is computed for each extant virtual page in an address space, the u-bit indicative that an object on the virtual page is being moved by a Garbage Collector process. An instruction is executed which causes an access protection fault. The state of the u-bit for the virtual page associated with the access protection fault is consulted when the access protection fault is encountered. Additionally, the access protection fault is translated into a user-trap (utrap) and the utrap is serviced when the u-bit is set.
Abstract:
A method for managing a memory, including obtaining a number of indices and a cache line size of a cache memory, computing a cache page size by multiplying the number of indices by the cache line size, calculating a greatest common denominator (GCD) of the cache page size and a first size class, incrementing, in response to the GCD of the cache page size and the first size class exceeding the cache line size, the first size class to generate an updated first size class, calculating a GCD of the cache page size and the updated first size class, creating, in response to the GCD of the cache page size and the updated first size class being less than the cache line size, a first superblock in the memory including a first plurality of blocks of the updated first size class, and creating a second superblock in the memory.
Abstract:
A system and method for concurrency control may use slotted read-write locks. A slotted read-write lock is a lock data structure associated with a shared memory area, wherein the slotted read-write lock indicates whether any thread has a read-lock and/or a write-lock for the shared memory area. Multiple threads may concurrently have the read-lock but only one thread can have the write-lock at any given time. The slotted read-write lock comprises multiple slots, each associated with a single thread. To acquire the slotted read-write lock for reading, a thread assigned to a slot performs a store operation to the slot and then attempts to determine that no other thread holds the slotted read-write lock for writing. To acquire the slotted read-write lock for writing, a thread assigned to a slot sets its write-bit and then attempts to determine that the write-lock is not held.
Abstract:
A method for providing applications with a current time value includes receiving a trap for an application to access a time memory page, creating, in a memory map corresponding to the application, a mapping between an address space of the application and the time memory page in response to the trap, accessing, based on the trap, a hardware clock to obtain a time value, and updating the time memory page with the time value. The application reads the time value from the time memory page using the memory map.
Abstract:
The systems and methods described herein may enable a processor core to run at higher speeds than other processor cores in the same package. A thread executing on one processor core may begin waiting for another thread to complete a particular action (e.g., to release a lock). In response to determining that other threads are waiting, the thread/core may enter an inactive state. A data structure may store information indicating which threads are waiting on which other threads. In response to determining that a quorum of threads/cores are in an inactive state, one of the threads/cores may enter a turbo mode in which it executes at a higher speed than the baseline speed for the cores. A thread holding a lock and executing in turbo mode may perform work delegated by waiting threads at the higher speed. A thread may exit the inactive state when the waited-for action is completed.
Abstract:
The system described herein may track references to a shared object by concurrently executing threads using a reference tracking data structure that includes an owner field and an array of byte-addressable per-thread entries, each including a per-thread reference counter and a per-thread counter lock. Slotted threads assigned to a given array entry may increment or decrement the per-thread reference counter in that entry in response to referencing or dereferencing the shared object. Unslotted threads may increment or decrement a shared unslotted reference counter. A thread may update the data structure and/or examine it to determine whether the number of references to the shared object is zero or non-zero using a blocking-optimistic or a non-blocking mechanism. A checking thread may acquire ownership of the data structure, obtain an instantaneous snapshot of all counters, and return a value indicating whether the number of references to the shared object is zero or non-zero.
Abstract:
The present disclosure describes a unique way for each of multiple processes to operate in parallel and use the same shared data without causing corruption to the shared data. For example, during a commit phase, a corresponding transaction can attempt to increment a globally accessible version information variable and store a current value of the globally accessible version information variable for updating version information associated with modified data regardless of whether an associated attempt by the corresponding transaction to modify the globally accessible version information variable was successful. As an alternative mode, a corresponding transaction can merely read and store a current value of the globally accessible version information variable without attempting to update the globally accessible version information variable before such use. In yet another application, a parallel processing environment implements a combination of both aforementioned modes depending on a self-abort rate of the transaction.