MASKED MULTI-LANE INSTRUCTION HAVING BOTH FAST AND SLOW EXECUTION PATHS

    公开(公告)号:US20210096857A1

    公开(公告)日:2021-04-01

    申请号:US16585973

    申请日:2019-09-27

    Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.

    DIFFERENTIAL PIPELINE DELAYS IN A COPROCESSOR

    公开(公告)号:US20240045694A1

    公开(公告)日:2024-02-08

    申请号:US18211007

    申请日:2023-06-16

    CPC classification number: G06F9/3867 G06F9/3836

    Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.

    DISTRIBUTED SCHEDULER PROVIDING EXECUTION PIPE BALANCE

    公开(公告)号:US20210073056A1

    公开(公告)日:2021-03-11

    申请号:US16568038

    申请日:2019-09-11

    Abstract: A processor includes a plurality of execution pipes and a distributed scheduler coupled to the plurality of execution pipes. The distributed scheduler includes a first queue to buffer instruction operations from a front end of an instruction pipeline of the processor and a plurality of second queues, wherein each second queue is to buffer instruction operations allocated from the first queue for a corresponding separate subset of execution pipes of the plurality of execution pipes. The distributed scheduler further includes a queue controller to select an allocation mode from a plurality of allocation modes based on whether at least one indicator of an imbalance at the distributed scheduler is detected, and further to control the distributed scheduler to allocate instruction operations from the first queue among the plurality of second queues in accordance with the selected allocation mode.

    SETTING VALUES OF PORTIONS OF REGISTERS BASED ON BIT VALUES

    公开(公告)号:US20190190536A1

    公开(公告)日:2019-06-20

    申请号:US15842027

    申请日:2017-12-14

    CPC classification number: H03M7/20 G06F16/166

    Abstract: A processor employs a set of bits to indicate values of portions of registers of a register file. In response to a specified instruction indicating an expected change of instruction types to be executed, the processor sets one or more of the bits and, for subsequent instructions, interprets corresponding portions of the registers as having a specified value (e.g., zero). By employing the set of bits to set the values of the register portions, rather than setting the individual portions of the registers to the specified value, the processor conserves processor resources (e.g., power) when the processor transitions between executing instructions of different types.

    SHADOW LATCHES IN A SHADOW-LATCH CONFIGURED REGISTER FILE FOR THREAD STORAGE

    公开(公告)号:US20210132985A1

    公开(公告)日:2021-05-06

    申请号:US16668469

    申请日:2019-10-30

    Abstract: A processing system includes a processor core and a scheduler coupled to the processor core. The processing system executes a first active thread and a second active thread in the processor core and detects a swap event for the first active thread or the second active thread. Based on the swap event, using a shadow-latch configured fixed mapping system, to the processing system replaces either the first active thread or the second active thread with a shadow-based thread, the shadow-based thread being stored in a shadow-latch configured register file.

    REGISTER RENAMING AFTER A NON-PICKABLE SCHEDULER QUEUE

    公开(公告)号:US20210117196A1

    公开(公告)日:2021-04-22

    申请号:US16660495

    申请日:2019-10-22

    Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.

    DIFFERENTIAL PIPLINE DELAYS IN A COPROCESSOR

    公开(公告)号:US20190179643A1

    公开(公告)日:2019-06-13

    申请号:US15837974

    申请日:2017-12-11

    Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.

    BIT WIDTH RECONFIGURATION USING A SHADOW-LATCH CONFIGURED REGISTER FILE

    公开(公告)号:US20210096862A1

    公开(公告)日:2021-04-01

    申请号:US16585817

    申请日:2019-09-27

    Abstract: A processor includes a front-end with an instruction set that operates at a first bit width and a floating point unit coupled to receive the instruction set in the processor that operates at the first bit width. The floating point unit operates at a second bit width and, based upon a bit width assessment of the instruction set provided to the floating point unit, the floating point unit employs a shadow-latch configured floating point register file to perform bit width reconfiguration. The shadow-latch configured floating point register file includes a plurality of regular latches and a plurality of shadow latches for storing data that is to be either read from or written to the shadow latches. The bit width reconfiguration enables the floating point unit that operates at the second bit width to operate on the instruction set received at the first bit width.

    MUTLI-MODAL GATHER OPERATION
    9.
    发明申请

    公开(公告)号:US20210096858A1

    公开(公告)日:2021-04-01

    申请号:US16586247

    申请日:2019-09-27

    Abstract: An apparatus includes a plurality of load buses and a load store unit that includes a plurality of load ports to access the plurality of load buses. The load store unit performs a gather operation to concurrently gather a plurality of subsets of data from a memory via the plurality of load buses in a first mode. The apparatus also includes a register that is partitioned into a plurality of portions to hold the plurality of subsets of data provided by the load store unit. The load store unit ignores exceptions or faults while performing the gather operation in the first mode and transitions to a second mode in response to an exception or fault. Two lanes are dispatched to concurrently perform the gather operation per clock cycle in the first mode and a single lane is dispatched to perform the gather operation per clock cycle in the second mode.

    POWER CONSERVATION IN A COPROCESSOR
    10.
    发明申请

    公开(公告)号:US20190179396A1

    公开(公告)日:2019-06-13

    申请号:US15837918

    申请日:2017-12-11

    Abstract: A pipeline includes a first portion configured to process a first subset of bits of an instruction and a second portion configured to process a second subset of the bits of the instruction. A first clock mesh is configured to provide a first clock signal to the first portion of the pipeline. A second clock mesh is configured to provide a second clock signal to the second portion of the pipeline. The first and second clock meshes selectively provide the first and second clock signals based on characteristics of in-flight instructions that have been dispatched to the pipeline but not yet retired. In some cases, a physical register file is configured to store values of bits representative of instructions. Only the first subset is stored in the physical register file in response to the value of the zero high bit indicating that the second subset is equal to zero.

Patent Agency Ranking