METHODS, SYSTEMS, AND APPARATUSES TO OPTIMIZE CROSS-LANE PACKED DATA INSTRUCTION IMPLEMENTATION ON A PARTIAL WIDTH PROCESSOR WITH A MINIMAL NUMBER OF MICRO-OPERATIONS

    公开(公告)号:US20220206791A1

    公开(公告)日:2022-06-30

    申请号:US17134100

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement a cross-lane packed data instruction on a partial (e.g., half) width processor with a minimal number of micro-operations are described. In one embodiment, a hardware processor core includes a decoder circuit to decode a single packed data instruction into only a first micro-operation and a second micro-operation, a packed data execution circuit to execute the first micro-operation and the second micro-operation, and a reservation station circuit coupled between the decoder circuit and the packed data execution circuit, the reservation station circuit comprising a first reservation station entry for the first micro-operation to store a first set of fields that indicate three or more input sources and a first destination, and a second reservation station entry for the second micro-operation to store a second set of fields to indicate three or more input sources and a second destination.

    PROCESSOR CIRCUITRY TO PERFORM A FUSED MULTIPLY-ADD

    公开(公告)号:US20240354057A1

    公开(公告)日:2024-10-24

    申请号:US18523186

    申请日:2023-11-29

    CPC classification number: G06F7/523

    Abstract: Techniques and mechanisms for circuitry to support the performance of a fused multiply-add (FMA) operation with one or more denormal numbers. In some embodiments, a processor is operable to execute a FMA instruction comprising or otherwise identifying two multiplicands, and an addend. Such execution includes performing one-way alignment of an addend significand based on a difference between respective exponent values of the two multiplicands. The alignment is performed in parallel with operations by a multiplier circuit based on respective significand values of the two multiplicands. Subtraction of a J-bit correction value is performed in the multiplier circuit to avoid mitigate execution delay. In another embodiment, first circuitry of a processor executes an FMA instruction, wherein components of the first circuitry are shared with second circuitry of the processor, and wherein the second circuitry supports the execution of a floating-point multiplication instruction.

Patent Agency Ranking