摘要:
A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
摘要翻译:包括单个未融合融合的浮点乘法(FMA)模块的计算机处理器计算用于融合乘法加法运算和未加密乘法加法运算的浮点数的操作A * B + C的结果。 在一个实施例中,利用额外的硬件来增加融合乘法加法实现,其中计算未加密的乘法加法结果,而不增加额外的流水线级。 在一个实施例中,使用单个操作码来启动由融合未分配的浮点乘法(FMA)模块进行的计算,该操作码确定是否生成融合乘法加法结果或未合并的乘法加法结果。
摘要:
A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
摘要翻译:包括单个未融合融合的浮点乘法(FMA)模块的计算机处理器计算用于融合乘法加法运算和未加密乘法加法运算的浮点数的操作A * B + C的结果。 在一个实施例中,利用额外的硬件来增加融合乘法加法实现,其中计算未加密的乘法加法结果,而不增加额外的流水线级。 在一个实施例中,使用单个操作码来启动由融合未分配的浮点乘法(FMA)模块进行的计算,该操作码确定是否生成融合乘法加法结果或未合并的乘法加法结果。
摘要:
A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
摘要翻译:包括单个未融合融合的浮点乘法(FMA)模块的计算机处理器计算用于融合乘法加法运算和未加密乘法加法运算的浮点数的操作A * B + C的结果。 在一个实施例中,利用额外的硬件来增加融合乘法加法实现,其中计算未加密的乘法加法结果,而不增加额外的流水线级。 在一个实施例中,使用单个操作码来启动由融合未分配的浮点乘法(FMA)模块进行的计算,该操作码确定是否生成融合乘法加法结果或未合并的乘法加法结果。
摘要:
Embodiments of the present invention provide a processor that merges stores in an N-entry first-in-first-out (FIFO) store queue. In these embodiments, the processor starts by executing instructions before a checkpoint is generated. When executing instructions before the checkpoint is generated, the processor is configured to perform limited or no merging of stores into existing entries in the store queue. Then, upon detecting a predetermined condition, the processor is configured to generate a checkpoint. After generating the checkpoint, the processor is configured to continue to execute instructions. When executing instructions after the checkpoint is generated, the processor is configured to freely merge subsequent stores into post-checkpoint entries in the store queue.
摘要:
Embodiments of the present invention provide a processor that merges stores in an N-entry first-in-first-out (FIFO) store queue. In these embodiments, the processor starts by executing instructions before a checkpoint is generated. When executing instructions before the checkpoint is generated, the processor is configured to perform limited or no merging of stores into existing entries in the store queue. Then, upon detecting a predetermined condition, the processor is configured to generate a checkpoint. After generating the checkpoint, the processor is configured to continue to execute instructions. When executing instructions after the checkpoint is generated, the processor is configured to freely merge subsequent stores into post-checkpoint entries in the store queue.