Mechanism to speed-up multithreaded execution by register file write port reallocation
    1.
    发明授权
    Mechanism to speed-up multithreaded execution by register file write port reallocation 有权
    通过注册文件写入端口重新分配来加快多线程执行的机制

    公开(公告)号:US09207995B2

    公开(公告)日:2015-12-08

    申请号:US13170003

    申请日:2011-06-27

    IPC分类号: G06F9/30 G06F9/52 G06F9/38

    摘要: Various systems and processes may be used to speed up multi-threaded execution. In certain implementations, a system and process may include the ability to write results of a first group of execution units associated with a first register file into the first register file using a first write port of the first register file and write results of a second group of execution units associated with a second register file into the second register file using a first write port of the second register file. The system, apparatus, and process may also include the ability to connect, in a shared register file mode, results of the second group of execution units to a second write port of the first register file and connect, in a split register file mode, results of a part of the first group of execution units to the second write port of the first register file.

    摘要翻译: 可以使用各种系统和过程来加速多线程执行。 在某些实现中,系统和过程可以包括使用第一寄存器堆的第一写入端口将与第一寄存器堆相关联的第一组执行单元的结果写入第一寄存器堆的能力,以及第二组的写入结果 使用第二寄存器文件的第一写入端口将与第二寄存器文件相关联的执行单元分配到第二寄存器堆中。 系统,装置和过程还可以包括以共享寄存器文件模式将第二组执行单元的结果连接到第一寄存器堆的第二写入端口并以分割寄存器文件模式连接的能力, 将第一组执行单元的一部分的结果提供给第一注册文件的第二写入端口。

    Apparatus and method for calculating an SHA-2 hash function in a general purpose processor
    2.
    发明授权
    Apparatus and method for calculating an SHA-2 hash function in a general purpose processor 有权
    用于在通用处理器中计算SHA-2哈希函数的装置和方法

    公开(公告)号:US09164725B2

    公开(公告)日:2015-10-20

    申请号:US13181678

    申请日:2011-07-13

    IPC分类号: G06F7/00 H04L9/32

    摘要: Various systems, apparatuses, processes, and/or products may be used to calculate an SHA-2 hash function in a general-purpose processor. In some implementations, a system, apparatus, process, and/or product may include the ability to calculate at least one SHA-2 sigma function by using an execution unit adapted for performing a processor instruction, the execution unit including an integrated circuit primarily designed for calculating the SHA-2 sigma function(s), and calculating the SHA-2 hash function with general-purpose hardware processing components of the processor based on the sigma function(s). In certain implementations, the calculation of the SHA-2 sigma function(s) can be performed by the integrated circuit within a single instruction, allowing for a faster calculation of the SHA-2 hash function.

    摘要翻译: 可以使用各种系统,装置,处理和/或产品来计算通用处理器中的SHA-2哈希函数。 在一些实现中,系统,装置,过程和/或产品可以包括通过使用适于执行处理器指令的执行单元来计算至少一个SHA-2σ功能的能力,所述执行单元包括主要设计的集成电路 用于计算SHA-2西格玛函数,以及基于σ函数计算具有处理器的通用硬件处理组件的SHA-2哈希函数。 在某些实现中,SHA-2西格玛函数的计算可以由单个指令中的集成电路执行,从而可以更快地计算SHA-2哈希函数。

    METHOD AND DATA PROCESSING UNIT FOR CALCULATING AT LEAST ONE MULTIPLY-SUM OF TWO CARRY-LESS MULTIPLICATIONS OF TWO INPUT OPERANDS, DATA PROCESSING PROGRAM AND COMPUTER PROGRAM PRODUCT
    3.
    发明申请
    METHOD AND DATA PROCESSING UNIT FOR CALCULATING AT LEAST ONE MULTIPLY-SUM OF TWO CARRY-LESS MULTIPLICATIONS OF TWO INPUT OPERANDS, DATA PROCESSING PROGRAM AND COMPUTER PROGRAM PRODUCT 有权
    方法和数据处理单元,用于计算两个输入运算的两次无关多项式的多项式,数据处理程序和计算机程序产品

    公开(公告)号:US20120150933A1

    公开(公告)日:2012-06-14

    申请号:US13183639

    申请日:2011-07-15

    IPC分类号: G06F7/52 G06F7/50

    摘要: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.

    摘要翻译: 可以使用各种系统,装置,处理和程序来计算两个输入操作数的两次无进位乘法的乘法和。 在特定实施方案中,系统,装置,过程和程序可以包括使用用于输入操作数的输入数据总线和用于总体计算结果的输出数据总线的能力,每个总线包括2n位的宽度,其中n是 大于1的整数。 系统,装置,过程和程序还可以计算用于较低级别的分层结构的两个输入操作数的无进位乘法,并且计算用于更高级别的至少一个乘法和至少一个中间乘数和 基于较低级别的无进位乘法的结构级别。 可以根据所使用的全部宽度的输出数据总线,根据操作模式,输出一定数量的乘数作为总计算结果。

    Method and data processing unit for calculating at least one multiply-sum of two carry-less multiplications of two input operands, data processing program and computer program product
    4.
    发明授权
    Method and data processing unit for calculating at least one multiply-sum of two carry-less multiplications of two input operands, data processing program and computer program product 有权
    用于计算两个输入操作数,数据处理程序和计算机程序产品的两个无进位乘法的至少一个乘法和方法和数据处理单元

    公开(公告)号:US08903882B2

    公开(公告)日:2014-12-02

    申请号:US13183639

    申请日:2011-07-15

    IPC分类号: G06F7/38 G06F7/53

    摘要: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.

    摘要翻译: 可以使用各种系统,装置,处理和程序来计算两个输入操作数的两次无进位乘法的乘法和。 在特定实施方案中,系统,装置,过程和程序可以包括使用用于输入操作数的输入数据总线和用于总体计算结果的输出数据总线的能力,每个总线包括2n位的宽度,其中n是 大于1的整数。 系统,装置,过程和程序还可以计算用于较低级别的分层结构的两个输入操作数的无进位乘法,并且计算用于更高级别的至少一个乘法和至少一个中间乘数和 基于较低级别的无进位乘法的结构级别。 可以根据所使用的全部宽度的输出数据总线,根据操作模式,输出一定数量的乘数作为总计算结果。

    APPARATUS AND METHOD FOR CALCULATING AN SHA-2 HASH FUNCTION IN A GENERAL PURPOSE PROCESSOR
    5.
    发明申请
    APPARATUS AND METHOD FOR CALCULATING AN SHA-2 HASH FUNCTION IN A GENERAL PURPOSE PROCESSOR 有权
    在一般用途处理器中计算SHA-2 HASH功能的装置和方法

    公开(公告)号:US20120128149A1

    公开(公告)日:2012-05-24

    申请号:US13181678

    申请日:2011-07-13

    IPC分类号: H04L9/28

    摘要: Various systems, apparatuses, processes, and/or products may be used to calculate an SHA-2 hash function in a general-purpose processor. In some implementations, a system, apparatus, process, and/or product may include the ability to calculate at least one SHA-2 sigma function by using an execution unit adapted for performing a processor instruction, the execution unit including an integrated circuit primarily designed for calculating the SHA-2 sigma function(s), and calculating the SHA-2 hash function with general-purpose hardware processing components of the processor based on the sigma function(s). In certain implementations, the calculation of the SHA-2 sigma function(s) can be performed by the integrated circuit within a single instruction, allowing for a faster calculation of the SHA-2 hash function.

    摘要翻译: 可以使用各种系统,装置,处理和/或产品来计算通用处理器中的SHA-2哈希函数。 在一些实现中,系统,装置,过程和/或产品可以包括通过使用适于执行处理器指令的执行单元来计算至少一个SHA-2σ功能的能力,所述执行单元包括主要设计的集成电路 用于计算SHA-2西格玛函数,以及基于σ函数计算具有处理器的通用硬件处理组件的SHA-2哈希函数。 在某些实现中,SHA-2西格玛函数的计算可以由单个指令中的集成电路执行,从而可以更快地计算SHA-2哈希函数。

    MECHANISM TO SPEED-UP MULTITHREADED EXECUTION BY REGISTER FILE WRITE PORT REALLOCATION
    6.
    发明申请
    MECHANISM TO SPEED-UP MULTITHREADED EXECUTION BY REGISTER FILE WRITE PORT REALLOCATION 有权
    通过注册文件写入端口重新加速进行多机化执行的机制

    公开(公告)号:US20120110271A1

    公开(公告)日:2012-05-03

    申请号:US13170003

    申请日:2011-06-27

    IPC分类号: G06F12/00

    摘要: Various systems and processes may be used to speed up multi-threaded execution. In certain implementations, a system and process may include the ability to write results of a first group of execution units associated with a first register file into the first register file using a first write port of the first register file and write results of a second group of execution units associated with a second register file into the second register file using a first write port of the second register file. The system, apparatus, and process may also include the ability to connect, in a shared register file mode, results of the second group of execution units to a second write port of the first register file and connect, in a split register file mode, results of a part of the first group of execution units to the second write port of the first register file.

    摘要翻译: 可以使用各种系统和过程来加速多线程执行。 在某些实现中,系统和过程可以包括使用第一寄存器堆的第一写入端口将与第一寄存器堆相关联的第一组执行单元的结果写入第一寄存器堆的能力,以及第二组的写入结果 使用第二寄存器文件的第一写入端口将与第二寄存器文件相关联的执行单元分配到第二寄存器堆中。 系统,装置和过程还可以包括以共享寄存器文件模式将第二组执行单元的结果连接到第一寄存器堆的第二写入端口并以分割寄存器文件模式连接的能力, 将第一组执行单元的一部分的结果提供给第一注册文件的第二写入端口。

    Zero Indication Forwarding for Floating Point Unit Power Reduction
    7.
    发明申请
    Zero Indication Forwarding for Floating Point Unit Power Reduction 失效
    浮点单元功率降低的零指示转发

    公开(公告)号:US20120284548A1

    公开(公告)日:2012-11-08

    申请号:US13552327

    申请日:2012-07-18

    IPC分类号: G06F1/00

    摘要: A method and system for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.

    摘要翻译: 一种在处理数学运算时降低功耗的方法和系统。 在从执行指令的执行单元接收一个或多个操作数的处理器硬件设备中,功率可能会降低。 在将操作数转发到执行组件以完成数学运算之前,电路检测多个操作数的至少一个操作数是否为零操作数。 当至少一个操作数为零操作数或至少一个操作数无序时,会设置一个触发门控时钟信号的标志。 时钟信号的门控禁用执行数学运算的一个或多个处理级和/或器件。 禁用级和/或设备可以在减少的数据路径上计算数学运算的正确结果。 当设备被禁用时,可能会关闭设备电源,直到后续操作再次要求设备。

    Zero indication forwarding for floating point unit power reduction
    8.
    发明授权
    Zero indication forwarding for floating point unit power reduction 失效
    用于浮点单元功率降低的零指示转发

    公开(公告)号:US08578196B2

    公开(公告)日:2013-11-05

    申请号:US13552327

    申请日:2012-07-18

    IPC分类号: G06F1/00

    摘要: A method and system for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.

    摘要翻译: 一种在处理数学运算时降低功耗的方法和系统。 在从执行指令的执行单元接收一个或多个操作数的处理器硬件设备中,功率可能会降低。 在将操作数转发到执行组件以完成数学运算之前,电路检测多个操作数的至少一个操作数是否为零操作数。 当至少一个操作数为零操作数或至少一个操作数为“无序”时,设置触发时钟信号选通的标志。 时钟信号的门控禁用执行数学运算的一个或多个处理级和/或器件。 禁用级和/或设备可以在减少的数据路径上计算数学运算的正确结果。 当设备被禁用时,可能会关闭设备电源,直到后续操作再次要求设备。

    Zero indication forwarding for floating point unit power reduction
    9.
    发明授权
    Zero indication forwarding for floating point unit power reduction 失效
    用于浮点单元功率降低的零指示转发

    公开(公告)号:US08255726B2

    公开(公告)日:2012-08-28

    申请号:US12176191

    申请日:2008-07-18

    IPC分类号: G06F1/00

    摘要: A method, system and computer program product for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.

    摘要翻译: 一种用于在处理数学运算时降低功耗的方法,系统和计算机程序产品。 在从执行指令的执行单元接收一个或多个操作数的处理器硬件设备中,功率可能会降低。 在将操作数转发到执行组件以完成数学运算之前,电路检测多个操作数的至少一个操作数是否为零操作数。 当至少一个操作数为零操作数或至少一个操作数为“无序”时,设置触发时钟信号选通的标志。 时钟信号的门控禁用执行数学运算的一个或多个处理级和/或器件。 禁用级和/或设备可以在减少的数据路径上计算数学运算的正确结果。 当设备被禁用时,可能会关闭设备电源,直到后续操作再次要求设备。

    Reducing register file leakage current within a processor
    10.
    发明授权
    Reducing register file leakage current within a processor 失效
    在处理器内减少寄存器文件漏电流

    公开(公告)号:US07509511B1

    公开(公告)日:2009-03-24

    申请号:US12116085

    申请日:2008-05-06

    IPC分类号: G06F1/32

    摘要: A method for reducing leakage current within a register file of a processor is disclosed. The register file within the processor is partitioned into at least two power domains, and each of the two power domains can be powered independently. At least one of the two power domains includes at least as many physical registers as there are architected registers defined in an instruction set architecture of the processor. In response to an occurrence of an idle condition within the processor, all architected register file entries are consolidated into one of power domains that will not be powered off, and the power domains that does not contain any architected register file entries after consolidating are powered off. Afterwards, in response to a detection of an end of the idle condition, all of the power domains are powered back on.

    摘要翻译: 公开了一种用于减少处理器的寄存器文件内的泄漏电流的方法。 处理器内的寄存器文件被划分为至少两个电源域,并且两个电源域中的每一个可以独立供电。 两个功率域中的至少一个包括至少与处理器的指令集架构中定义的架构寄存器一样多的物理寄存器。 为了响应处理器内的空闲状况的发生,所有架构的寄存器文件条目被合并到不被关闭的电源域之一中,并且在合并之后不包含任何架构化的寄存器文件条目的电源被关闭 。 之后,响应于空闲状态的结束的检测,所有的电源域被重新接通。