PROCESSOR CIRCUITRY TO PERFORM A FUSED MULTIPLY-ADD

    公开(公告)号:US20240354057A1

    公开(公告)日:2024-10-24

    申请号:US18523186

    申请日:2023-11-29

    CPC classification number: G06F7/523

    Abstract: Techniques and mechanisms for circuitry to support the performance of a fused multiply-add (FMA) operation with one or more denormal numbers. In some embodiments, a processor is operable to execute a FMA instruction comprising or otherwise identifying two multiplicands, and an addend. Such execution includes performing one-way alignment of an addend significand based on a difference between respective exponent values of the two multiplicands. The alignment is performed in parallel with operations by a multiplier circuit based on respective significand values of the two multiplicands. Subtraction of a J-bit correction value is performed in the multiplier circuit to avoid mitigate execution delay. In another embodiment, first circuitry of a processor executes an FMA instruction, wherein components of the first circuitry are shared with second circuitry of the processor, and wherein the second circuitry supports the execution of a floating-point multiplication instruction.

    Methods, systems, and apparatuses to optimize partial flag updating instructions via dynamic two-pass execution in a processor

    公开(公告)号:US12039329B2

    公开(公告)日:2024-07-16

    申请号:US17134108

    申请日:2020-12-24

    CPC classification number: G06F9/223 G06F9/30145

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement dynamic two-pass execution of a partial flag updating instruction in a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode instructions into a set of one or more micro-operations, an execution circuit to execute the micro-operations decoded for the instructions, a data register to store data, a flag register to store a plurality of flags, and a reservation station circuit coupled between the decoder circuit and the execution circuit, the reservation station circuit to, in response to an indicator bit set to a multiple pass mode for a single micro-operation in a reservation station entry, perform a first dispatch of the single micro-operation to the execution circuit, when a source data operand in the data register is ready for execution and a source flag operand in the flag register is not ready for execution, to generate a data resultant, and a second dispatch of the single micro-operation to the execution circuit when both the source data operand in the data register and the source flag operand in the flag register are ready for execution to generate a flag resultant based on one or more of the plurality of flags in the flag register.

    METHODS, SYSTEMS, AND APPARATUSES TO OPTIMIZE CROSS-LANE PACKED DATA INSTRUCTION IMPLEMENTATION ON A PARTIAL WIDTH PROCESSOR WITH A MINIMAL NUMBER OF MICRO-OPERATIONS

    公开(公告)号:US20220206791A1

    公开(公告)日:2022-06-30

    申请号:US17134100

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement a cross-lane packed data instruction on a partial (e.g., half) width processor with a minimal number of micro-operations are described. In one embodiment, a hardware processor core includes a decoder circuit to decode a single packed data instruction into only a first micro-operation and a second micro-operation, a packed data execution circuit to execute the first micro-operation and the second micro-operation, and a reservation station circuit coupled between the decoder circuit and the packed data execution circuit, the reservation station circuit comprising a first reservation station entry for the first micro-operation to store a first set of fields that indicate three or more input sources and a first destination, and a second reservation station entry for the second micro-operation to store a second set of fields to indicate three or more input sources and a second destination.

    Instruction and logic for a matrix scheduler

    公开(公告)号:US09851976B2

    公开(公告)日:2017-12-26

    申请号:US14581101

    申请日:2014-12-23

    CPC classification number: G06F9/3838

    Abstract: A processor includes a core and a scheduler. The scheduler includes first and second dependency matrices and a ready determination unit. The scheduler also includes logic to queue a first parent operation, a second parent operation, and a child operation that includes first and second sources dependent on the first and second parent operations. The scheduler also includes logic to store physical addresses of the first and second sources of the child operation respectively in the first and second dependency matrices. Further, the scheduler includes logic to perform a tag comparisons between the respective physical addresses of the destinations of the first and second parent operations respectively with the respective physical address of the first and second sources of the child operation. In addition, the ready determination unit includes logic to determine that the child operation is ready for dispatch based on the tag comparisons.

    ACCELERATING KECCAK ALGORITHMS
    5.
    发明公开

    公开(公告)号:US20240211253A1

    公开(公告)日:2024-06-27

    申请号:US18145744

    申请日:2022-12-22

    CPC classification number: G06F9/30029 G06F9/3016 G06F9/3802

    Abstract: A method comprises fetching, by fetch circuitry, an encoded parity instruction comprising at least one opcode, a first source identifier for a first source, a second source identifier for a second source, a third source identifier for a third source, and a destination identifier for a destination, decoding, by decode circuitry, the encoded parity instruction to generate a decoded parity instruction; and executing, by execution circuitry, the decoded parity instruction to retrieve operands representing a first register from the first source, a second register from the second source, a third register from the third source, and an index from the third source, perform an XOR operation of four words of data from the first register and single word of data from the second register in a position represented by the index to generate a parity value, and store the parity value in a the first register in a position represented by the index.

    Power logic for memory address conversion
    6.
    发明授权
    Power logic for memory address conversion 有权
    用于存储器地址转换的电源逻辑

    公开(公告)号:US09330022B2

    公开(公告)日:2016-05-03

    申请号:US13926564

    申请日:2013-06-25

    CPC classification number: G06F12/1036 G06F9/3001 G06F9/32

    Abstract: In an embodiment, a processor includes a plurality of cores. Each core includes conversion power logic to receive an instruction including an untranslated memory address, determine whether a code segment (CS) base address is equal to zero, and in response to a determination that the CS base address is equal to zero, execute the instruction using the untranslated memory address. Other embodiments are described and claimed.

    Abstract translation: 在一个实施例中,处理器包括多个核。 每个核心包括用于接收包括非翻译存储器地址的指令的转换功率逻辑,确定代码段(CS)基地址是否等于零,并且响应于CS基地址等于零的确定,执行指令 使用非翻译的内存地址。 描述和要求保护其他实施例。

Patent Agency Ranking