MASKED MULTI-LANE INSTRUCTION HAVING BOTH FAST AND SLOW EXECUTION PATHS

    公开(公告)号:US20210096857A1

    公开(公告)日:2021-04-01

    申请号:US16585973

    申请日:2019-09-27

    Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.

    MUTLI-MODAL GATHER OPERATION
    2.
    发明申请

    公开(公告)号:US20210096858A1

    公开(公告)日:2021-04-01

    申请号:US16586247

    申请日:2019-09-27

    Abstract: An apparatus includes a plurality of load buses and a load store unit that includes a plurality of load ports to access the plurality of load buses. The load store unit performs a gather operation to concurrently gather a plurality of subsets of data from a memory via the plurality of load buses in a first mode. The apparatus also includes a register that is partitioned into a plurality of portions to hold the plurality of subsets of data provided by the load store unit. The load store unit ignores exceptions or faults while performing the gather operation in the first mode and transitions to a second mode in response to an exception or fault. Two lanes are dispatched to concurrently perform the gather operation per clock cycle in the first mode and a single lane is dispatched to perform the gather operation per clock cycle in the second mode.

Patent Agency Ranking