Systems, apparatuses, and methods for chained fused multiply add

    公开(公告)号:US11487541B2

    公开(公告)日:2022-11-01

    申请号:US17107134

    申请日:2020-11-30

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

    Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

    公开(公告)号:US10275243B2

    公开(公告)日:2019-04-30

    申请号:US15201442

    申请日:2016-07-02

    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

    Systems, apparatuses, and methods for chained fused multiply add

    公开(公告)号:US10146535B2

    公开(公告)日:2018-12-04

    申请号:US15299420

    申请日:2016-10-20

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

    Generational Thread Scheduler
    19.
    发明申请
    Generational Thread Scheduler 审中-公开
    生成线程调度程序

    公开(公告)号:US20170031729A1

    公开(公告)日:2017-02-02

    申请号:US15290375

    申请日:2016-10-11

    CPC classification number: G06F9/52 G06F2209/5014

    Abstract: Disclosed herein is a generational thread scheduler. One embodiment may be used with processor multithreading logic to execute threads of executable instructions, and a shared resource to be allocated fairly among the threads of executable instructions contending for access to the shared resource. Generational thread scheduling logic may allocate the shared resource efficiently and fairly by granting a first requesting thread access to the shared resource allocating a reservation for the shared resource to each other requesting thread of the executing threads and then blocking the first thread from re-requesting the shared resource until every other thread that has been allocated a reservation, has been granted access to the shared resource. Generation tracking state may be cleared when each requesting thread of the generation that was allocated a reservation has had their request satisfied.

    Abstract translation: 这里公开的是一代代线程调度器。 一个实施例可以与处理器多线程逻辑一起使用以执行可执行指令的线程,以及在竞争访问共享资源的可执行指令的线程之间公平分配的共享资源。 生成线程调度逻辑可以通过向共享资源授予对共享资源的预留的第一请求线程访问来对其执行线程的请求线程,然后阻止第一线程重新请求 共享资源,直到已分配了预留的每个其他线程已被授予对共享资源的访问权限。 当分配了预约的生成的每个请求线程已经满足了请求时,可以清除生成跟踪状态。

Patent Agency Ranking