Instructions and logic to provide atomic range modification operations

    公开(公告)号:US10528345B2

    公开(公告)日:2020-01-07

    申请号:US14671914

    申请日:2015-03-27

    Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.

    Technologies for fast synchronization barriers for many-core processing

    公开(公告)号:US09760410B2

    公开(公告)日:2017-09-12

    申请号:US14568890

    申请日:2014-12-12

    Inventor: Arch D. Robison

    CPC classification number: G06F9/522

    Abstract: Technologies for multithreaded synchronization including a computing device having a many-core processor. Each processor core includes multiple hardware threads. A hardware thread executed by a processor core enters a synchronization barrier and synchronizes with other hardware threads executed by the same processor core. After synchronization, the hardware thread synchronizes with a source hardware thread that may be executed by a different processor core. The source hardware thread may be assigned using an n-way shuffle of all hardware threads, where n is the number of hardware threads per processor core. The hardware thread resynchronizes with the other hardware threads executed by the same processor core. The hardware thread alternately synchronizes with the source hardware thread and the other hardware threads executed by the same processor core until all hardware threads have synchronized. The computing device may reduce a Boolean value over the synchronization barrier. Other embodiments are described and claimed.

    USER-LEVEL FORK AND JOIN PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
    5.
    发明申请
    USER-LEVEL FORK AND JOIN PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 有权
    用户层级和加工处理器,方法,系统和指令

    公开(公告)号:US20160283245A1

    公开(公告)日:2016-09-29

    申请号:US14671475

    申请日:2015-03-27

    Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.

    Abstract translation: 一方面的处理器包括多个处理器元件和第一处理器元件。 第一处理器元件可以执行软件线程的用户级叉指令。 第一处理器元件可以包括用于解码用户级叉指令的解码器。 用户级fork指令至少指示一个指令地址。 第一处理器元件还可以包括用户级线程分支模块。 用户级叉模块响应于被解码的用户级fork指令,可以配置多个处理器元件中的每一个并行执行指令。 公开了其他处理器,方法,系统和指令。

Patent Agency Ranking