METHODS, SYSTEMS, AND APPARATUSES TO OPTIMIZE PARTIAL FLAG UPDATING INSTRUCTIONS VIA DYNAMIC TWO-PASS EXECUTION IN A PROCESSOR

    公开(公告)号:US20220206792A1

    公开(公告)日:2022-06-30

    申请号:US17134108

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement dynamic two-pass execution of a partial flag updating instruction in a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode instructions into a set of one or more micro-operations, an execution circuit to execute the micro-operations decoded for the instructions, a data register to store data, a flag register to store a plurality of flags, and a reservation station circuit coupled between the decoder circuit and the execution circuit, the reservation station circuit to, in response to an indicator bit set to a multiple pass mode for a single micro-operation in a reservation station entry, perform a first dispatch of the single micro-operation to the execution circuit, when a source data operand in the data register is ready for execution and a source flag operand in the flag register is not ready for execution, to generate a data resultant, and a second dispatch of the single micro-operation to the execution circuit when both the source data operand in the data register and the source flag operand in the flag register are ready for execution to generate a flag resultant based on one or more of the plurality of flags in the flag register.

    METHODS, SYSTEMS, AND APPARATUSES FOR A SCALABLE RESERVATION STATION IMPLEMENTING A SINGLE UNIFIED SPECULATION STATE PROPAGATION AND EXECUTION WAKEUP MATRIX CIRCUIT IN A PROCESSOR

    公开(公告)号:US20220206793A1

    公开(公告)日:2022-06-30

    申请号:US17134154

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to a scalable reservation station circuit implementing a single unified speculation state propagation and execution wakeup matrix in a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode one or more instructions into a first micro-operation to load data from a data cache, a second micro-operation dependent on the first micro-operation, and a third micro-operation dependent on the second micro-operation; an execution circuit to execute the first micro-operation, the second micro-operation, and the third micro-operation; and a reservation station circuit comprising a load speculation tracker circuit and coupled between the decoder circuit and the execution circuit, the load speculation tracker circuit to, for a reservation station entry of the third micro-operation, track progress of the first micro-operation in the data cache to generate a cancellation indication for the third micro-operation in response to a miss of the data in the data cache for the first micro-operation, wherein the load speculation tracker circuit is to begin to track the progress of the first micro-operation in the data cache in response to a dispatch of the first micro-operation into the data cache.

    METHODS, SYSTEMS, AND APPARATUSES FOR SCALABLE PORT-BINDING FOR ASYMMETRIC EXECUTION PORTS AND ALLOCATION WIDTHS OF A PROCESSOR

    公开(公告)号:US20220100569A1

    公开(公告)日:2022-03-31

    申请号:US17033739

    申请日:2020-09-26

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement scalable port-binding for asymmetric execution ports and allocation widths of a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode instructions into sets of one or more micro-operations, an instruction decode queue to store the sets of one or more micro-operations, a plurality of different types of execution circuits that each comprise a respective input port and a respective input queue, and an allocation circuit comprising a plurality of allocation lanes coupled to the instruction decode queue and to the input ports of the plurality of different types of execution circuits, wherein the allocation circuit is to, for an input of micro-operations on the plurality of allocation lanes, generate a sorted list of occupancy of the input queues of each input port, generate a pre-binding mapping of the input ports of the plurality of different types of execution circuits to the plurality of allocation lanes in a circular order according to the sorted list, when a type of micro-operation from an allocation lane does not match a type of execution circuit of an input port in the pre-binding mapping, slide the pre-binding mapping so that the input port maps to a next allocation lane having a matching type of micro-operation to generate a final mapping of the input ports of the plurality of different types of execution circuits to the plurality of allocation lanes, and bind the input ports of the plurality of different types of execution circuits to the plurality of allocation lanes according to the final mapping.

Patent Agency Ranking