System and method for implementing NUMA-aware reader-writer locks
    41.
    发明授权
    System and method for implementing NUMA-aware reader-writer locks 有权
    用于实现NUMA感知读写器锁的系统和方法

    公开(公告)号:US08966491B2

    公开(公告)日:2015-02-24

    申请号:US13458868

    申请日:2012-04-27

    CPC classification number: G06F9/526 G06F2209/523

    Abstract: NUMA-aware reader-writer locks may leverage lock cohorting techniques to band together writer requests from a single NUMA node. The locks may relax the order in which the lock schedules the execution of critical sections of code by reader threads and writer threads, allowing lock ownership to remain resident on a single NUMA node for long periods, while also taking advantage of parallelism between reader threads. Threads may contend on node-level structures to get permission to acquire a globally shared reader-writer lock. Writer threads may follow a lock cohorting strategy of passing ownership of the lock in write mode from one thread to a cohort writer thread without releasing the shared lock, while reader threads from multiple NUMA nodes may simultaneously acquire the shared lock in read mode. The reader-writer lock may follow a writer-preference policy, a reader-preference policy or a hybrid policy.

    Abstract translation: NUMA感知的读写器锁可以利用锁定队列技术将来自单个NUMA节点的写入器请求带到一起。 锁可以放松锁定通过读取器线程和写入器线程调度关键代码段的顺序,允许锁定所有权长时间保持驻留在单个NUMA节点上,同时还利用读取器线程之间的并行性。 线程可能会争取节点级结构获得获取全局共享读写器锁的权限。 编写者线程可能遵循锁定队列策略,将锁定的所有权从写入模式从一个线程传递到队列写入器线程,而不会释放共享锁定,而来自多个NUMA节点的读取器线程可以同时在读取模式下获取共享锁定。 读写器锁可以遵循写入者偏好策略,读者偏好策略或混合策略。

    System and method for reducing serialization in transactional memory using gang release of blocked threads
    42.
    发明授权
    System and method for reducing serialization in transactional memory using gang release of blocked threads 有权
    使用阻塞线程的释放来减少事务性内存中的序列化的系统和方法

    公开(公告)号:US08789057B2

    公开(公告)日:2014-07-22

    申请号:US12327659

    申请日:2008-12-03

    CPC classification number: G06F9/466 G06F9/4843 G06F9/4881 G06F9/52 G06F9/528

    Abstract: Transactional Lock Elision (TLE) may allow multiple threads to concurrently execute critical sections as speculative transactions. Transactions may abort due to various reasons. To avoid starvation, transactions may revert to execution using mutual exclusion when transactional execution fails. Because threads may revert to mutual exclusion in response to the mutual exclusion of other threads, a positive feedback loop may form in times of high congestion, causing a “lemming effect”. To regain the benefits of concurrent transactional execution, the system may allow one or more threads awaiting a given lock to be released from the wait queue and instead attempt transactional execution. A gang release may allow a subset of waiting threads to be released simultaneously. The subset may be chosen dependent on the number of waiting threads, historical abort relationships between threads, analysis of transactions of each thread, sensitivity of each thread to abort, and/or other thread-local or global criteria.

    Abstract translation: 事务锁定Elision(TLE)可允许多个线程同时执行关键部分作为投机交易。 交易可能因各种原因而中止。 为了避免饥饿,当事务执行失败时,事务可以使用互斥来恢复执行。 因为线程可能会因为其他线程的相互排斥而回到互斥状态,所以在高拥塞的时候可能形成正反馈回路,导致“线性效应”。 为了重新获得并发事务执行的好处,系统可能允许一个或多个线程等待给定的锁从等待队列中释放,而不是尝试事务执行。 帮派版本可能允许同时释放等待线程的子集。 可以根据等待线程的数量,线程之间的历史中止关系,每个线程的事务分析,每个线程的中止的灵敏度和/或其他线程局部或全局标准来选择该子集。

    System and method for enabling turbo mode in a processor
    43.
    发明授权
    System and method for enabling turbo mode in a processor 有权
    用于在处理器中启用turbo模式的系统和方法

    公开(公告)号:US08775837B2

    公开(公告)日:2014-07-08

    申请号:US13213833

    申请日:2011-08-19

    CPC classification number: G06F9/526 G06F1/3228 G06F1/324 G06F9/485 Y02D10/126

    Abstract: The systems and methods described herein may enable a processor core to run at higher speeds than other processor cores in the same package. A thread executing on one processor core may begin waiting for another thread to complete a particular action (e.g., to release a lock). In response to determining that other threads are waiting, the thread/core may enter an inactive state. A data structure may store information indicating which threads are waiting on which other threads. In response to determining that a quorum of threads/cores are in an inactive state, one of the threads/cores may enter a turbo mode in which it executes at a higher speed than the baseline speed for the cores. A thread holding a lock and executing in turbo mode may perform work delegated by waiting threads at the higher speed. A thread may exit the inactive state when the waited-for action is completed.

    Abstract translation: 本文描述的系统和方法可以使处理器核心以比同一封装中的其它处理器核心更高的速度运行。 在一个处理器核心上执行的线程可以开始等待另一个线程来完成特定动作(例如,释放锁定)。 响应于确定其他线程正在等待,线程/内核可能进入非活动状态。 数据结构可以存储指示哪些线程在哪个其他线程上等待的信息。 响应于确定线程/核心的法定数量处于非活动状态,线程/内核中的一个可以进入turbo模式,在该模式下,该模式以比核心的基线速度更高的速度执行。 持有锁并以turbo模式执行的线程可以执行以较高速度等待线程委托的工作。 等待操作完成时,线程可能会退出非活动状态。

    System and method for NUMA-aware locking using lock cohorts
    44.
    发明授权
    System and method for NUMA-aware locking using lock cohorts 有权
    使用锁定队列进行NUMA感知锁定的系统和方法

    公开(公告)号:US08694706B2

    公开(公告)日:2014-04-08

    申请号:US13458871

    申请日:2012-04-27

    CPC classification number: G06F9/526

    Abstract: The system and methods described herein may be used to implement NUMA-aware locks that employ lock cohorting. These lock cohorting techniques may reduce the rate of lock migration by relaxing the order in which the lock schedules the execution of critical code sections by various threads, allowing lock ownership to remain resident on a single NUMA node longer than under strict FIFO ordering, thus reducing coherence traffic and improving aggregate performance. A NUMA-aware cohort lock may include a global shared lock that is thread-oblivious, and multiple node-level locks that provide cohort detection. The lock may be constructed from non-NUMA-aware components (e.g., spin-locks or queue locks) that are modified to provide thread-obliviousness and/or cohort detection. Lock ownership may be passed from one thread that holds the lock to another thread executing on the same NUMA node without releasing the global shared lock.

    Abstract translation: 本文描述的系统和方法可以用于实现采用锁定队列的NUMA感知锁。 这些锁定队列技术可以通过放松锁定通过各种线程调度关键代码段的执行顺序来降低锁定迁移速率,从而允许锁定所有权保持驻留在单个NUMA节点上比在严格的FIFO排序之前更长,从而减少 一致性流量和提高总体性能。 NUMA感知的群组锁可能包括线程忽略的全局共享锁和提供队列检测的多个节点级锁。 锁可以由修改为提供线程忽略性和/或队列检测的非NUMA感知组件(例如,旋转锁或队列锁)构建。 锁定所有权可以从保存锁的一个线程传递到在同一NUMA节点上执行的另一个线程,而不会释放全局共享锁。

    Multi-lane concurrent bag for facilitating inter-thread communication
    45.
    发明授权
    Multi-lane concurrent bag for facilitating inter-thread communication 有权
    多通道并发包,方便线程间通信

    公开(公告)号:US08689237B2

    公开(公告)日:2014-04-01

    申请号:US13241015

    申请日:2011-09-22

    Abstract: A method, system, and medium are disclosed for facilitating communication between multiple concurrent threads of execution using a multi-lane concurrent bag. The bag comprises a plurality of independently-accessible concurrent intermediaries (lanes) that are each configured to store data elements. The bag provides an insert function executable to insert a given data element into the bag by selecting one of the intermediaries and inserting the data element into the selected intermediary. The bag also provides a consume function executable to consume a data element from the bag by choosing one of the intermediaries and consuming (removing and returning) a data element stored in the chosen intermediary. The bag guarantees that execution of the consume function consumes a data element if the bag is non-empty and permits multiple threads to execute the insert or consume functions concurrently.

    Abstract translation: 公开了一种方法,系统和介质,用于促进使用多通道并行包的多个并行执行线程之间的通信。 袋子包括多个独立可访问的并行中间件(通道),其被配置为存储数据元素。 该袋提供插入功能可执行以通过选择一个中间体并将数据元素插入所选择的中间体来将给定的数据元素插入袋中。 该袋还提供消耗功能,可通过选择一个中间体并消耗(去除和返回)存储在所选择的中间体中的数据元素来从袋中消耗数据元素。 该包保证消费功能的执行消耗数据元素,如果包不是空的,并允许多个线程同时执行插入或者消费功能。

    System and method for tracking references to shared objects using byte-addressable per-thread reference counters
    46.
    发明授权
    System and method for tracking references to shared objects using byte-addressable per-thread reference counters 有权
    用于使用字节可寻址的每线程引用计数器来跟踪对共享对象的引用的系统和方法

    公开(公告)号:US08677076B2

    公开(公告)日:2014-03-18

    申请号:US12750455

    申请日:2010-03-30

    CPC classification number: G06F12/0261

    Abstract: The system described herein may track references to a shared object by concurrently executing threads using a reference tracking data structure that includes an owner field and an array of byte-addressable per-thread entries, each including a per-thread reference counter and a per-thread counter lock. Slotted threads assigned to a given array entry may increment or decrement the per-thread reference counter in that entry in response to referencing or dereferencing the shared object. Unslotted threads may increment or decrement a shared unslotted reference counter. A thread may update the data structure and/or examine it to determine whether the number of references to the shared object is zero or non-zero using a blocking-optimistic or a non-blocking mechanism. A checking thread may acquire ownership of the data structure, obtain an instantaneous snapshot of all counters, and return a value indicating whether the number of references to the shared object is zero or non-zero.

    Abstract translation: 本文描述的系统可以通过使用包括所有者字段和字节可寻址每个线程条目的数组的参考跟踪数据结构并行执行线程来跟踪对共享对象的引用,每个线程项包括每线程参考计数器和每线程参考计数器, 螺纹计数器锁。 分配给给定阵列条目的时隙线程可以增加或减少该条目中的每个线程引用计数器,以响应引用或取消引用共享对象。 未分配的线程可以递增或递减共享的未引用的引用计数器。 线程可以更新数据结构和/或检查它,以使用阻塞乐观或非阻塞机制来确定对共享对象的引用数量是零还是非零。 检查线程可以获取数据结构的所有权,获得所有计数器的瞬时快照,并返回一个值,该值指示对共享对象的引用数是零还是非零。

    System and Method for Implementing NUMA-Aware Reader-Writer Locks
    47.
    发明申请
    System and Method for Implementing NUMA-Aware Reader-Writer Locks 有权
    实现NUMA感知读写器锁的系统和方法

    公开(公告)号:US20130290967A1

    公开(公告)日:2013-10-31

    申请号:US13458868

    申请日:2012-04-27

    CPC classification number: G06F9/526 G06F2209/523

    Abstract: NUMA-aware reader-writer locks may leverage lock cohorting techniques to band together writer requests from a single NUMA node. The locks may relax the order in which the lock schedules the execution of critical sections of code by reader threads and writer threads, allowing lock ownership to remain resident on a single NUMA node for long periods, while also taking advantage of parallelism between reader threads. Threads may contend on node-level structures to get permission to acquire a globally shared reader-writer lock. Writer threads may follow a lock cohorting strategy of passing ownership of the lock in write mode from one thread to a cohort writer thread without releasing the shared lock, while reader threads from multiple NUMA nodes may simultaneously acquire the shared lock in read mode. The reader-writer lock may follow a writer-preference policy, a reader-preference policy or a hybrid policy.

    Abstract translation: NUMA感知的读写器锁可以利用锁定队列技术将来自单个NUMA节点的写入器请求带到一起。 锁可以放松锁定通过读取器线程和写入器线程调度关键代码段的顺序,允许锁定所有权长时间保持驻留在单个NUMA节点上,同时还利用读取器线程之间的并行性。 线程可能会争取节点级结构获得获取全局共享读写器锁的权限。 编写者线程可能遵循锁定队列策略,将锁定的所有权从写入模式从一个线程传递到队列写入器线程,而不会释放共享锁定,而来自多个NUMA节点的读取器线程可以同时在读取模式下获取共享锁定。 读写器锁可以遵循写入者偏好策略,读者偏好策略或混合策略。

    System and method for optimizing a code section by forcing a code section to be executed atomically
    48.
    发明授权
    System and method for optimizing a code section by forcing a code section to be executed atomically 有权
    通过强制代码段以原子方式执行来优化代码段的系统和方法

    公开(公告)号:US08533699B2

    公开(公告)日:2013-09-10

    申请号:US13077793

    申请日:2011-03-31

    CPC classification number: G06F9/467 G06F8/443 G06F9/30087 G06F12/0261

    Abstract: Systems and methods for optimizing code may use transactional memory to optimize one code section by forcing another code section to execute atomically. Application source code may be analyzed to identify instructions in one code section that only need to be executed if there exists the possibility that another code section (e.g., a critical section) could be partially executed or that its results could be affected by interference. In response to identifying such instructions, alternate code may be generated that forces the critical section to be executed as an atomic transaction, e.g., using best-effort hardware transactional memory. This alternate code may replace the original code or may be included in an alternate execution path that can be conditionally selected for execution at runtime. The alternate code may elide the identified instructions (which are rendered unnecessary by the transaction) by removing them, or by including them in the alternate execution path.

    Abstract translation: 用于优化代码的系统和方法可以使用事务存储器来通过强制另一个代码部分以原子方式执行来优化一个代码段。 可以分析应用源代码以识别一个代码部分中的指令,其仅在存在可以部分地执行另一代码部分(例如,关键部分)或其结果可能受到干扰的影响的情况下才需要执行。 响应于识别这样的指令,可以生成迫使关键部分作为原子事务执行的替代代码,例如使用尽力而为的硬件事务存储器。 该替代代码可以替换原始代码,或者可以被包括在可以有选择地在运行时执行的备用执行路径中。 替代代码可以通过删除它们或者将它们包括在备用执行路径中来去除所识别的指令(由事务变得不必要)。

    System and Method for Mitigating the Impact of Branch Misprediction When Exiting Spin Loops
    49.
    发明申请
    System and Method for Mitigating the Impact of Branch Misprediction When Exiting Spin Loops 有权
    减少自旋循环中分支预测影响的系统和方法

    公开(公告)号:US20130198499A1

    公开(公告)日:2013-08-01

    申请号:US13362903

    申请日:2012-01-31

    CPC classification number: G06F9/30058 G06F9/30079 G06F9/325 G06F9/3848

    Abstract: A computer system may recognize a busy-wait loop in program instructions at compile time and/or may recognize busy-wait looping behavior during execution of program instructions. The system may recognize that an exit condition for a busy-wait loop is specified by a conditional branch type instruction in the program instructions. In response to identifying the loop and the conditional branch type instruction that specifies its exit condition, the system may influence or override a prediction made by a dynamic branch predictor, resulting in a prediction that the exit condition will be met and that the loop will be exited regardless of any observed branch behavior for the conditional branch type instruction. The looping instructions may implement waiting for an inter-thread communication event to occur or for a lock to become available. When the exit condition is met, the loop may be exited without incurring a misprediction delay.

    Abstract translation: 计算机系统可以在编译时识别程序指令中的忙等待循环和/或可以在程序指令执行期间识别忙等待循环行为。 系统可以认识到忙 - 等待循环的退出条件由程序指令中的条件分支类型指令指定。 响应于识别循环和指定其退出条件的条件分支类型指令,系统可以影响或覆盖由动态分支预测器做出的预测,导致预测退出条件将被满足,并且循环将 退出条件分支类型指令的任何观察到的分支行为。 循环指令可以实现等待线程间通信事件发生或锁定变得可用。 当满足退出条件时,可以退出循环而不产生误预计延迟。

    System and method for implementing hierarchical queue-based locks using flat combining
    50.
    发明授权
    System and method for implementing hierarchical queue-based locks using flat combining 有权
    使用平面组合实现基于层次化的队列锁的系统和方法

    公开(公告)号:US08458721B2

    公开(公告)日:2013-06-04

    申请号:US13152079

    申请日:2011-06-02

    CPC classification number: G06F9/526

    Abstract: The system and methods described herein may be used to implement a scalable, hierarchal, queue-based lock using flat combining. A thread executing on a processor core in a cluster of cores that share a memory may post a request to acquire a shared lock in a node of a publication list for the cluster using a non-atomic operation. A combiner thread may build an ordered (logical) local request queue that includes its own node and nodes of other threads (in the cluster) that include lock requests. The combiner thread may splice the local request queue into a (logical) global request queue for the shared lock as a sub-queue. A thread whose request has been posted in a node that has been combined into a local sub-queue and spliced into the global request queue may spin on a lock ownership indicator in its node until it is granted the shared lock.

    Abstract translation: 本文描述的系统和方法可以用于使用平坦组合来实现可扩展的,分级的基于队列的锁。 在共享内存的核心集群中的处理器核心上执行的线程可以使用非原子操作来发布用于获取集群的发布列表的节点中的共享锁定的请求。 组合线程可以构建一个有序(逻辑)本地请求队列,其包括其自己的节点和包含锁定请求的其他线程(在集群中)的节点。 组合器线程可以将本地请求队列拼接成用于共享锁的(逻辑)全局请求队列作为子队列。 已经将其请求已经发布在已经组合到本地子队列中并被拼接到全局请求队列中的节点的线程可以旋转其节点中的所有权所有者指示符,直到被授予共享锁为止。

Patent Agency Ranking