Abstract:
Transactional reader-writer locks may leverage available hardware transactional memory (HTM) to simplify the procedures of the reader-writer lock algorithm and to eliminate a requirement for type stable memory An HTM-based reader-writer lock may include an ordered list of client-provided nodes, each of which represents a thread that holds (or desires to acquire) the lock, and a tail pointer. The locking and unlocking procedures invoked by readers and writers may access the tail pointer or particular ones of the nodes in the list using various combinations of transactions and non-transactional accesses to insert nodes into the list or to remove nodes from the list. A reader or writer that owns a node at the head of the list (or a reader whose node is preceded in the list only by other readers' nodes) may access a critical section of code or shared resource.
Abstract:
The systems and methods described herein may implement scalable statistics counters that are adaptive to the amount of contention for the counters. The counters may be accessible within transactions. Methods for determining whether or when to increment the counters in response to initiation of an increment operation and/or methods for updating the counters may be selected dependent on current, recent, or historical amounts of contention. Various contention management policies or retry conditions may be applied to select between multiple methods. One counter may include a precise counter portion that is incremented under low contention and a probabilistic counter portion that is updated under high contention. Amounts by which probabilistic counters are incremented may be contention-dependent. Another counter may include a node identifier portion that encourages consecutive increments by threads on a single node only when under contention. Another counter may be inflated in response to contention for the counter.
Abstract:
Generic Concurrency Restriction (GCR) may divide a set of threads waiting to acquire a lock into two sets: an active set currently able to contend for the lock, and a passive set waiting for an opportunity to join the active set and contend for the lock. The number of threads in the active set may be limited to a predefined maximum or even a single thread. Generic Concurrency Restriction may be implemented as a wrapper around an existing lock implementation. Generic Concurrency Restriction may, in some embodiments, be unfair (e.g., to some threads) over the short term, but may improve the overall throughput of the underlying multithreaded application via passivation of a portion of the waiting threads.
Abstract:
Parameter permutation is performed for federated learning to train a machine learning model. Parameter permutation is performed by client systems of a federated machine learning system on updated parameters of a machine learning model that have been updated as part of training using local training data. An intra-model shuffling technique is performed at the client systems according to a shuffling pattern. Then, the encoded parameters are provided to an aggregation server using Private Information Retrieval (PIR) queries generated according to the shuffling pattern.
Abstract:
NUMA-aware reader-writer locks may leverage lock cohorting techniques that introduce a synthetic level into the lock hierarchy (e.g., one whose nodes do not correspond to the system topology). The synthetic level may include a global reader lock and a global writer lock. A writer thread may acquire a node-level writer lock, then the global writer lock, and then the top-level lock, after which it may access a critical section protected by the lock. The writer may release the lock (if an upper bound on consecutive writers has been met), or may pass the lock to another writer (on the same node or a different node, according to a fairness policy). A reader may acquire the global reader lock (whether or not node-level reader locks are present), and then the top-level lock. However, readers may only hold these locks long enough to increment reader counts associated with them.
Abstract:
A data object has a lock and a condition indicator associated with it. Based at least partly on detecting a first setting of the condition indicator, a reader stores an indication that the reader has obtained read access to the data object in an element of a readers structure and reads the data object without acquiring the lock. A writer detects the first setting and replaces it with a second setting, indicating that the lock is to be acquired by readers before reading the data object. Prior to performing a write on the data object, the writer verifies that one or more elements of the readers structure have been cleared.
Abstract:
A computer comprising multiple processors and non-uniform memory implements multiple threads that perform a lock operation using a shared lock structure that includes a pointer to a tail of a first-in-first-out (FIFO) queue of threads waiting to acquire the lock. To acquire the lock, a thread allocates and appends a data structure to the FIFO queue. The lock is released by selecting and notifying a waiting thread to which control is transferred, with the thread selected executing on the same processor socket as the thread controlling the lock. A secondary queue of threads is managed for threads deferred during the selection process and maintained within the data structures of the waiting threads such that no memory is required within the lock structure. If no threads executing on the same processor socket are waiting for the lock, entries in the secondary queue are transferred to the FIFO queue preserving FIFO order.
Abstract:
NUMA-aware reader-writer locks may leverage lock cohorting techniques that introduce a synthetic level into the lock hierarchy (e.g., one whose nodes do not correspond to the system topology). The synthetic level may include a global reader lock and a global writer lock. A writer thread may acquire a node-level writer lock, then the global writer lock, and then the top-level lock, after which it may access a critical section protected by the lock. The writer may release the lock (if an upper bound on consecutive writers has been met), or may pass the lock to another writer (on the same node or a different node, according to a fairness policy). A reader may acquire the global reader lock (whether or not node-level reader locks are present), and then the top-level lock. However, readers may only hold these locks long enough to increment reader counts associated with them.
Abstract:
A data object has a lock and a condition indicator associated with it. Based at least partly on detecting a first setting of the condition indicator, a reader stores an indication that the reader has obtained read access to the data object in an element of a readers structure and reads the data object without acquiring the lock. A writer detects the first setting and replaces it with a second setting, indicating that the lock is to be acquired by readers before reading the data object. Prior to performing a write on the data object, the writer verifies that one or more elements of the readers structure have been cleared.
Abstract:
A reader of a set of data accessors that includes readers and writer detects that a particular lock of a first collection of non-global locks associated with a data object of a computing environment is held by another accessor. After checking a blocking indicator, the reader uses a second lock (which is not part of the first collection) to obtain read access to the data object and implements its reads without acquiring the particular lock. Prior to implementing a write on the data object, a writer acquires at least some locks of the first collection, and sets the blocking indicator to prevent readers from using the second lock to obtain read access to the data object.