Adaptive contention-aware thread placement for parallel runtime systems

    公开(公告)号:US10133602B2

    公开(公告)日:2018-11-20

    申请号:US14626754

    申请日:2015-02-19

    Abstract: An adaptive contention-aware thread scheduler may place software threads for pairs of applications on the same socket of a multi-socket machine for execution in parallel. Initial placements may be based on profile data that characterizes the machine and its behavior when multiple applications execute on the same socket. The profile data may be collected during execution of other applications. It may identify performance counters within the cores of the processor sockets whose values are suitable for predicting whether the performance of a pair of applications will suffer when executed together on the same socket (e.g., values indicative of their demands for particular shared resources). During execution, the scheduler may examine the performance counters (or performance metrics derived therefrom) and determine different placement decisions (e.g., placing an application with high demand for resources of one type together with an application with low demand for those resources).

    REDUCING SYNCHRONIZATION OF TASKS IN LATENCY-TOLERANT TASK-PARALLEL SYSTEMS

    公开(公告)号:US20170249187A1

    公开(公告)日:2017-08-31

    申请号:US15597460

    申请日:2017-05-17

    Abstract: Techniques are provided for reducing synchronization of tasks in a task scheduling system. A task queue includes multiple tasks, some of which require an I/O operation while other tasks require data stored locally in memory. A single thread is assigned to process tasks in the task queue. The thread determines if a task at the head of the task queue requires an I/O operation. If so, then the thread generates an I/O request, submits the I/O request, and may place the task at (or toward) the end of the task queue. When the task reaches the head of the task queue again, the thread determines if data requested by the I/O request is available yet. If so, then the thread processes the request. Otherwise, the thread may place the task at (or toward) the end of the task queue again.

    Dynamic Co-Scheduling of Hardware Contexts for Parallel Runtime Systems on Shared Machines

    公开(公告)号:US20170116033A1

    公开(公告)日:2017-04-27

    申请号:US15402140

    申请日:2017-01-09

    Abstract: Multi-core computers may implement a resource management layer between the operating system and resource-management-enabled parallel runtime systems. The resource management components and runtime systems may collectively implement dynamic co-scheduling of hardware contexts when executing multiple parallel applications, using a spatial scheduling policy that grants high priority to one application per hardware context and a temporal scheduling policy for re-allocating unused hardware contexts. The runtime systems may receive resources on a varying number of hardware contexts as demands of the applications change over time, and the resource management components may co-ordinate to leave one runnable software thread for each hardware context. Periodic check-in operations may be used to determine (at times convenient to the applications) when hardware contexts should be re-allocated. Over-subscription of worker threads may reduce load imbalances between applications. A co-ordination table may store per-hardware-context information about resource demands and allocations.

    Coordinated Garbage Collection in Distributed Systems
    24.
    发明申请
    Coordinated Garbage Collection in Distributed Systems 审中-公开
    分布式系统中的协调垃圾收集

    公开(公告)号:US20160070593A1

    公开(公告)日:2016-03-10

    申请号:US14723425

    申请日:2015-05-27

    Abstract: Fast modern interconnects may be exploited to control when garbage collection is performed on the nodes (e.g., virtual machines, such as JVMs) of a distributed system in which the individual processes communicate with each other and in which the heap memory is not shared. A garbage collection coordination mechanism (a coordinator implemented by a dedicated process on a single node or distributed across the nodes) may obtain or receive state information from each of the nodes and apply one of multiple supported garbage collection coordination policies to reduce the impact of garbage collection pauses, dependent on that information. For example, if the information indicates that a node is about to collect, the coordinator may trigger a collection on all of the other nodes (e.g., synchronizing collection pauses for batch-mode applications where throughput is important) or may steer requests to other nodes (e.g., for interactive applications where request latencies are important).

    Abstract translation: 可以利用快速的现代互连来控制在分布式系统的节点(例如,虚拟机,例如JVM)上执行垃圾收集的过程,其中各个进程彼此通信并且堆存储器不在其中共享。 垃圾收集协调机制(由单个节点上的专用进程或通过节点分布的协调器)可以从每个节点获取或接收状态信息,并应用多个支持的垃圾收集协调策略之一来减少垃圾的影响 收集暂停,取决于该信息。 例如,如果信息指示节点即将要收集,则协调器可以触发所有其他节点上的集合(例如,对吞吐量很重要的批模式应用程序同步收集暂停),或者可以将请求引导到其他节点 (例如,对于请求延迟很重要的交互式应用程序)。

    Dynamic Co-Scheduling of Hardware Contexts for Parallel Runtime Systems on Shared Machines
    25.
    发明申请
    Dynamic Co-Scheduling of Hardware Contexts for Parallel Runtime Systems on Shared Machines 有权
    共享计算机上并行运行时系统的硬件上下文的动态协同调度

    公开(公告)号:US20150339158A1

    公开(公告)日:2015-11-26

    申请号:US14285513

    申请日:2014-05-22

    Abstract: Multi-core computers may implement a resource management layer between the operating system and resource-management-enabled parallel runtime systems. The resource management components and runtime systems may collectively implement dynamic co-scheduling of hardware contexts when executing multiple parallel applications, using a spatial scheduling policy that grants high priority to one application per hardware context and a temporal scheduling policy for re-allocating unused hardware contexts. The runtime systems may receive resources on a varying number of hardware contexts as demands of the applications change over time, and the resource management components may co-ordinate to leave one runnable software thread for each hardware context. Periodic check-in operations may be used to determine (at times convenient to the applications) when hardware contexts should be re-allocated. Over-subscription of worker threads may reduce load imbalances between applications. A co-ordination table may store per-hardware-context information about resource demands and allocations.

    Abstract translation: 多核计算机可以在操作系统和启用资源管理的并行运行时系统之间实现资源管理层。 当执行多个并行应用时,资源管理组件和运行时系统可以共同实现硬件上下文的动态协同调度,使用对每个硬件上下文给予一个应用的高优先级的空间调度策略和用于重新分配未使用的硬件上下文的时间调度策略 。 当应用程序的需求随时间变化时,运行时系统可以在不同数量的硬件上下文上接收资源,并且资源管理组件可以协调以为每个硬件上下文留下一个可运行的软件线程。 当重新分配硬件上下文时,可以使用周期性的登录操作来确定(有时适用于应用程序)。 工作线程的过度订阅可能会降低应用程序之间的负载不平衡。 协调表可以存储关于资源需求和分配的每个硬件上下文信息。

    Language interoperable runtime adaptable data collections

    公开(公告)号:US11593398B2

    公开(公告)日:2023-02-28

    申请号:US17067479

    申请日:2020-10-09

    Abstract: Adaptive data collections may include various type of data arrays, sets, bags, maps, and other data structures. A simple interface for each adaptive collection may provide access via a unified API to adaptive implementations of the collection. A single adaptive data collection may include multiple, different adaptive implementations. A system configured to implement adaptive data collections may include the ability to adaptively select between various implementations, either manually or automatically, and to map a given workload to differing hardware configurations. Additionally, hardware resource needs of different configurations may be predicted from a small number of workload measurements. Adaptive data collections may provide language interoperability, such as by leveraging runtime compilation to build adaptive data collections and to compile and optimize implementation code and user code together. Adaptive data collections may also provide language-independent such that implementation code may be written once and subsequently used from multiple programming languages.

    Coordinated garbage collection in distributed systems

    公开(公告)号:US11200164B2

    公开(公告)日:2021-12-14

    申请号:US16864042

    申请日:2020-04-30

    Abstract: Fast modern interconnects may be exploited to control when garbage collection is performed on the nodes (e.g., virtual machines, such as JVMs) of a distributed system in which the individual processes communicate with each other and in which the heap memory is not shared. A garbage collection coordination mechanism (a coordinator implemented by a dedicated process on a single node or distributed across the nodes) may obtain or receive state information from each of the nodes and apply one of multiple supported garbage collection coordination policies to reduce the impact of garbage collection pauses, dependent on that information. For example, if the information indicates that a node is about to collect, the coordinator may trigger a collection on all of the other nodes (e.g., synchronizing collection pauses for batch-mode applications where throughput is important) or may steer requests to other nodes (e.g., for interactive applications where request latencies are important).

    Fine-grained scheduling of work in runtime systems

    公开(公告)号:US11157321B2

    公开(公告)日:2021-10-26

    申请号:US16586743

    申请日:2019-09-27

    Abstract: A runtime system for distributing work between multiple threads in multi-socket shared memory machines that may support fine-grained scheduling of parallel loops. The runtime system may implement a request combining technique in which a representative thread requests work on behalf of other threads. The request combining technique may be asynchronous; a thread may execute work while waiting to obtain additional work via the request combining technique. Loops can be nested within one another, and the runtime system may provide control over the way in which hardware contexts are allocated to the loops at the different levels. An “inside out” approach may be used for nested loops in which a loop indicates how many levels are nested inside it, rather than a conventional “outside in” approach to nesting.

    Systems and methods for safely subscribing to locks using hardware extensions

    公开(公告)号:US10521277B2

    公开(公告)日:2019-12-31

    申请号:US14736123

    申请日:2015-06-10

    Abstract: Transactional Lock Elision allows hardware transactions to execute unmodified critical sections protected by the same lock concurrently, by subscribing to the lock and verifying that it is available before committing the transaction. A “lazy subscription” optimization, which delays lock subscription, can potentially cause behavior that cannot occur when the critical sections are executed under the lock. Hardware extensions may provide mechanisms to ensure that lazy subscriptions are safe (e.g., that they result in correct behavior). Prior to executing a critical section transactionally, its lock and subscription code may be identified (e.g., by writing their locations to special registers). Prior to committing the transaction, the thread executing the critical section may verify that the correct lock was correctly subscribed to. If not, or if locations identified by the special registers have been modified, the transaction may be aborted. Nested critical sections associated with different lock types may invoke different subscription code.

Patent Agency Ranking