Systems and Methods for Safely Subscribing to Locks Using Hardware Extensions

    公开(公告)号:US20240028424A1

    公开(公告)日:2024-01-25

    申请号:US18478820

    申请日:2023-09-29

    CPC classification number: G06F9/526 G06F9/467 G06F9/3851 G06F9/30087

    Abstract: Transactional Lock Elision allows hardware transactions to execute unmodified critical sections protected by the same lock concurrently, by subscribing to the lock and verifying that it is available before committing the transaction. A “lazy subscription” optimization, which delays lock subscription, can potentially cause behavior that cannot occur when the critical sections are executed under the lock. Hardware extensions may provide mechanisms to ensure that lazy subscriptions are safe (e.g., that they result in correct behavior). Prior to executing a critical section transactionally, its lock and subscription code may be identified (e.g., by writing their locations to special registers). Prior to committing the transaction, the thread executing the critical section may verify that the correct lock was correctly subscribed to. If not, or if locations identified by the special registers have been modified, the transaction may be aborted. Nested critical sections associated with different lock types may invoke different subscription code.

    Reducing synchronization of tasks in latency-tolerant task-parallel systems

    公开(公告)号:US10678588B2

    公开(公告)日:2020-06-09

    申请号:US15597460

    申请日:2017-05-17

    Abstract: Techniques are provided for reducing synchronization of tasks in a task scheduling system. A task queue includes multiple tasks, some of which require an I/O operation while other tasks require data stored locally in memory. A single thread is assigned to process tasks in the task queue. The thread determines if a task at the head of the task queue requires an I/O operation. If so, then the thread generates an I/O request, submits the I/O request, and may place the task at (or toward) the end of the task queue. When the task reaches the head of the task queue again, the thread determines if data requested by the I/O request is available yet. If so, then the thread processes the request. Otherwise, the thread may place the task at (or toward) the end of the task queue again.

    Fine-grained scheduling of work in runtime systems

    公开(公告)号:US10430243B2

    公开(公告)日:2019-10-01

    申请号:US15887793

    申请日:2018-02-02

    Abstract: A runtime system for distributing work between multiple threads in multi-socket shared memory machines that may support fine-grained scheduling of parallel loops. The runtime system may implement a request combining technique in which a representative thread requests work on behalf of other threads. The request combining technique may be asynchronous; a thread may execute work while waiting to obtain additional work via the request combining technique. Loops can be nested within one another, and the runtime system may provide control over the way in which hardware contexts are allocated to the loops at the different levels. An “inside out” approach may be used for nested loops in which a loop indicates how many levels are nested inside it, rather than a conventional “outside in” approach to nesting.

    Persistent Multi-Word Compare-and-Swap
    5.
    发明申请

    公开(公告)号:US20190258508A1

    公开(公告)日:2019-08-22

    申请号:US16275175

    申请日:2019-02-13

    Abstract: A computer system including one or more processors and persistent, word-addressable memory implements a persistent atomic multi-word compare-and-swap operation. On entry, a list of persistent memory locations of words to be updated, respective expected current values contained the persistent memory locations and respective new values to write to the persistent memory locations are provided. The operation atomically performs the process of comparing the existing contents of the persistent memory locations to the respective current values and, should they match, updating the persistent memory locations with the new values and returning a successful status. Should any of the contents of the persistent memory locations not match a respective current value, the operation returns a failed status. The operation is performed such that the system can recover from any failure or interruption by restoring the list of persistent memory locations.

    Comprehensive contention-based thread allocation and placement

    公开(公告)号:US11861272B2

    公开(公告)日:2024-01-02

    申请号:US15691530

    申请日:2017-08-30

    CPC classification number: G06F30/20 G06F9/5066

    Abstract: A system configured to implement Comprehensive Contention-Based Thread Allocation and Placement, may generate a description of a workload from multiple profiling runs and may combine this workload description with a description of the machine's hardware to model the workload's performance over alternative thread placements. For instance, the system may generate a machine description based on executing stress applications and machine performance counters monitoring various performance indicators during execution of a synthetic workload. Such a system may also generate a workload description based on profiling sessions and the performance counters. Additionally, behavior of a workload with a proposed thread placement may be modeled based on the machine description and workload description and a prediction of the workload's resource demands and/or performance may be generated.

    LANGUAGE INTEROPERABLE RUNTIME ADAPTABLE DATA COLLECTIONS

    公开(公告)号:US20230214407A1

    公开(公告)日:2023-07-06

    申请号:US18174535

    申请日:2023-02-24

    Abstract: Adaptive data collections may include various type of data arrays, sets, bags, maps, and other data structures. A simple interface for each adaptive collection may provide access via a unified API to adaptive implementations of the collection. A single adaptive data collection may include multiple, different adaptive implementations. A system configured to implement adaptive data collections may include the ability to adaptively select between various implementations, either manually or automatically, and to map a given workload to differing hardware configurations. Additionally, hardware resource needs of different configurations may be predicted from a small number of workload measurements. Adaptive data collections may provide language interoperability, such as by leveraging runtime compilation to build adaptive data collections and to compile and optimize implementation code and user code together. Adaptive data collections may also provide language-independent such that implementation code may be written once and subsequently used from multiple programming languages.

    Coordinated Garbage Collection in Distributed Systems

    公开(公告)号:US20220066927A1

    公开(公告)日:2022-03-03

    申请号:US17525384

    申请日:2021-11-12

    Abstract: Fast modern interconnects may be exploited to control when garbage collection is performed on the nodes (e.g., virtual machines, such as JVMs) of a distributed system in which the individual processes communicate with each other and in which the heap memory is not shared. A garbage collection coordination mechanism (a coordinator implemented by a dedicated process on a single node or distributed across the nodes) may obtain or receive state information from each of the nodes and apply one of multiple supported garbage collection coordination policies to reduce the impact of garbage collection pauses, dependent on that information. For example, if the information indicates that a node is about to collect, the coordinator may trigger a collection on all of the other nodes (e.g., synchronizing collection pauses for batch-mode applications where throughput is important) or may steer requests to other nodes (e.g., for interactive applications where request latencies are important).

Patent Agency Ranking