Abstract:
The present disclosure describes a unique way for each of multiple processes to operate in parallel using (e.g., reading, modifying, and writing to) the same shared data without causing corruption to the shared data. For example, each of multiple processes utilizes current and past data values associated with a global counter or clock for purposes of determining whether any shared variables used to produce a respective transaction outcome were modified (by another process) when executing a respective transaction. If a respective process detects that shared data used by respective process was modified during a transaction, the process can abort and retry the transaction rather than cause data corruption by storing locally maintained results associated with the transaction to a globally shared data space.
Abstract:
For each of multiple processes executing in parallel, as long as corresponding version information associated with a respective set of one or more shared variables used for computational purposes has not changed during execution of a respective transaction, results of the respective transaction can be globally committed to memory without causing data corruption. If version information associated with one or more respective shared variables (used to produce the transaction results) happens to change during a process of generating respective results, then a respective process can identify that another process modified the one or more respective shared variables during execution and that its transaction results should not be committed to memory. In this latter case, the transaction repeats itself until it is able to commit respective results without causing data corruption.
Abstract:
A computer system includes multiple processing threads that execute in parallel. The multiple processing threads have access to a global environment including different types of metadata enabling the processing threads to carry out simultaneous execution depending on a currently selected type of lock mode. A mode controller monitoring the processing threads initiates switching from one type of lock mode to another depending on current operating conditions such as an amount of contention amongst the multiple processing threads to modify the shared data. The mode controller can switch from one lock mode another regardless of whether any of the multiple processes are in the midst of executing a respective transaction. A most efficient lock mode can be selected to carry out the parallel transactions. In certain cases, switching of lock modes causes one or more of the processing threads to abort and retry a respective transaction according to the new mode.
Abstract:
Cache logic associated with a respective one of multiple processing threads executing in parallel updates corresponding data fields of a cache to uniquely mark its contents. The marked contents represent a respective read set for a transaction. For example, at an outset of executing a transaction, a respective processing thread chooses a data value to mark contents of the cache used for producing a transaction outcome for the processing thread. Upon each read of shared data from main memory, the cache stores a copy of the data and marks it as being used during execution of the processing thread. If uniquely marked contents of a respective cache line happen to be displaced (e.g., overwritten) during execution of a processing thread, then the transaction is aborted (rather than being committed to main memory) because there is a possibility that another transaction overwrote a shared data value used during the respective transaction.
Abstract:
A computer system includes multiple processing threads that execute in parallel. The multiple processing threads have access to a global environment including i) shared data utilized by the multiple processing threads, ii) a globally accessible register or buffer of version information that changes each time a respective one of the multiple processing threads modifies the shared data, and iii) respective lock information indicating whether one of the multiple processing threads has locked the shared data preventing other processing threads from modifying the shared data. To prevent data corruption, each of the processing threads aborts if a given processing thread detects a change in the version information or another processing thread has a lock on the shared data. This technique is well suited for use in applications such as processing threads that support a high number of reads with a corresponding number of fewer respective writes to shared data.
Abstract:
Systems and methods for integrating multiple best effort hardware transactional support mechanisms, such as Read Set Monitoring (RSM) and Best Effort Hardware Transactional Memory (BEHTM), in a single transactional memory implementation are described. The best effort mechanisms may be integrated such that the overhead associated with support of multiple mechanisms may be reduced and/or the performance of the resulting transactional memory implementations may be improved over those that include any one of the mechanisms, or an un-integrated collection of multiple such mechanisms. Two or more of the mechanisms may be employed concurrently or serially in a single attempt to execute a transaction, without aborting or retrying the transaction. State maintained or used by a first mechanism may be shared with or transferred to another mechanism for use in execution of the transaction. This transfer may be performed automatically by the integrated mechanisms (e.g., without user, programmer, or software intervention).
Abstract:
A partitioned ticket lock may control access to a shared resource, and may include a single ticket value field and multiple grant value fields. Each grant value may be the sole occupant of a respective cache line, an event count or sequencer instance, or a sub-lock. The number of grant values may be configurable and/or adaptable during runtime. To acquire the lock, a thread may obtain a value from the ticket value field using a fetch-and-increment type operation, and generate an identifier of a particular grant value field by applying a mathematical or logical function to the obtained ticket value. The thread may be granted the lock when the value of that grant value field matches the obtained ticket value. Releasing the lock may include computing a new ticket value, generating an identifier of another grant value field, and storing the new ticket value in the other grant value field.
Abstract:
In modern multi-threaded environments, threads often work cooperatively toward providing collective or aggregate throughput for an application as a whole. Optimizing in the small for “thread local” common path latency is often but not always the best approach for a concurrent system composed of multiple cooperating threads. Some embodiments provide a technique for augmenting traditional code emission with thread-aware policies and optimization strategies for a multi-threaded application. During operation, the system obtains information about resource contention between executing threads of the multi-threaded application. The system analyzes the resource contention information to identify regions of the code to be optimized. The system recompiles these identified regions to produce optimized code, which is then stored for subsequent execution.
Abstract:
The system and methods described herein may be used to implement a scalable, hierarchal, queue-based lock using flat combining. A thread executing on a processor core in a cluster of cores that share a memory may post a request to acquire a shared lock in a node of a publication list for the cluster using a non-atomic operation. A combiner thread may build an ordered (logical) local request queue that includes its own node and nodes of other threads (in the cluster) that include lock requests. The combiner thread may splice the local request queue into a (logical) global request queue for the shared lock as a sub-queue. A thread whose request has been posted in a node that has been combined into a local sub-queue and spliced into the global request queue may spin on a lock ownership indicator in its node until it is granted the shared lock.
Abstract:
A method for managing a memory, including obtaining a number of indices and a cache line size of a cache memory, computing a cache page size by multiplying the number of indices by the cache line size, calculating a greatest common denominator (GCD) of the cache page size and a first size class, incrementing, in response to the GCD of the cache page size and the first size class exceeding the cache line size, the first size class to generate an updated first size class, calculating a GCD of the cache page size and the updated first size class, creating, in response to the GCD of the cache page size and the updated first size class being less than the cache line size, a first superblock in the memory including a first plurality of blocks of the updated first size class, and creating a second superblock in the memory.