Active power dissipation detection based on erroneus clock gating equations
    1.
    发明授权
    Active power dissipation detection based on erroneus clock gating equations 有权
    基于错误时钟门控方程的有功功耗检测

    公开(公告)号:US09495490B2

    公开(公告)日:2016-11-15

    申请号:US13550207

    申请日:2012-07-16

    IPC分类号: G06F17/50

    CPC分类号: G06F17/5022 G06F2217/78

    摘要: A method detects active power dissipation in an integrated circuit. The method includes receiving a hardware design for the integrated circuit having one or more clock domains, wherein the hardware design comprises a local clock buffer for a clock domain, wherein the local clock buffer is configured to receive a clock signal and an actuation signal. The method includes adding instrumentation logic to the design for the clock domain, wherein the instrumentation logic is configured to compare a first value of the actuation signal determined at a beginning point of a test period to a second value of the actuation signal determined at a time when the clock domain is in an idle condition. The method includes detecting the clock domain includes unintended active power dissipation, in response to the first value of the actuation signal not being equal to the second value of the actuation signal.

    摘要翻译: 一种方法可以检测集成电路中的有功功率。 该方法包括接收具有一个或多个时钟域的集成电路的硬件设计,其中硬件设计包括用于时钟域的本地时钟缓冲器,其中本地时钟缓冲器被配置为接收时钟信号和致动信号。 该方法包括将仪器逻辑添加到时钟域的设计中,其中仪器逻辑被配置为将在测试周期的起始点确定的致动信号的第一值与每次确定的致动信号的第二值进行比较 当时钟域处于空闲状态时。 响应于致动信号的第一值不等于致动信号的第二值,该方法包括检测时钟域包括非预期的有功功耗。

    ACTIVE POWER DISSIPATION DETECTION BASED ON ERRONOUS CLOCK GATING EQUATIONS
    2.
    发明申请
    ACTIVE POWER DISSIPATION DETECTION BASED ON ERRONOUS CLOCK GATING EQUATIONS 有权
    基于错误时钟评估方法的主动断电检测

    公开(公告)号:US20140019780A1

    公开(公告)日:2014-01-16

    申请号:US13550207

    申请日:2012-07-16

    IPC分类号: G06F1/24 G06F1/00

    CPC分类号: G06F17/5022 G06F2217/78

    摘要: A method detects active power dissipation in an integrated circuit. The method includes receiving a hardware design for the integrated circuit having one or more clock domains, wherein the hardware design comprises a local clock buffer for a clock domain, wherein the local clock buffer is configured to receive a clock signal and an actuation signal. The method includes adding instrumentation logic to the design for the clock domain, wherein the instrumentation logic is configured to compare a first value of the actuation signal determined at a beginning point of a test period to a second value of the actuation signal determined at a time when the clock domain is in an idle condition. The method includes detecting the clock domain includes unintended active power dissipation, in response to the first value of the actuation signal not being equal to the second value of the actuation signal.

    摘要翻译: 一种方法可以检测集成电路中的有功功率。 该方法包括接收具有一个或多个时钟域的集成电路的硬件设计,其中硬件设计包括用于时钟域的本地时钟缓冲器,其中本地时钟缓冲器被配置为接收时钟信号和致动信号。 该方法包括将仪器逻辑添加到时钟域的设计中,其中仪器逻辑被配置为将在测试周期的起始点确定的致动信号的第一值与每次确定的致动信号的第二值进行比较 当时钟域处于空闲状态时。 响应于致动信号的第一值不等于致动信号的第二值,该方法包括检测时钟域包括非预期的有功功耗。

    Mechanism to speed-up multithreaded execution by register file write port reallocation
    5.
    发明授权
    Mechanism to speed-up multithreaded execution by register file write port reallocation 有权
    通过注册文件写入端口重新分配来加快多线程执行的机制

    公开(公告)号:US09207995B2

    公开(公告)日:2015-12-08

    申请号:US13170003

    申请日:2011-06-27

    IPC分类号: G06F9/30 G06F9/52 G06F9/38

    摘要: Various systems and processes may be used to speed up multi-threaded execution. In certain implementations, a system and process may include the ability to write results of a first group of execution units associated with a first register file into the first register file using a first write port of the first register file and write results of a second group of execution units associated with a second register file into the second register file using a first write port of the second register file. The system, apparatus, and process may also include the ability to connect, in a shared register file mode, results of the second group of execution units to a second write port of the first register file and connect, in a split register file mode, results of a part of the first group of execution units to the second write port of the first register file.

    摘要翻译: 可以使用各种系统和过程来加速多线程执行。 在某些实现中,系统和过程可以包括使用第一寄存器堆的第一写入端口将与第一寄存器堆相关联的第一组执行单元的结果写入第一寄存器堆的能力,以及第二组的写入结果 使用第二寄存器文件的第一写入端口将与第二寄存器文件相关联的执行单元分配到第二寄存器堆中。 系统,装置和过程还可以包括以共享寄存器文件模式将第二组执行单元的结果连接到第一寄存器堆的第二写入端口并以分割寄存器文件模式连接的能力, 将第一组执行单元的一部分的结果提供给第一注册文件的第二写入端口。

    Apparatus and method for calculating an SHA-2 hash function in a general purpose processor
    6.
    发明授权
    Apparatus and method for calculating an SHA-2 hash function in a general purpose processor 有权
    用于在通用处理器中计算SHA-2哈希函数的装置和方法

    公开(公告)号:US09164725B2

    公开(公告)日:2015-10-20

    申请号:US13181678

    申请日:2011-07-13

    IPC分类号: G06F7/00 H04L9/32

    摘要: Various systems, apparatuses, processes, and/or products may be used to calculate an SHA-2 hash function in a general-purpose processor. In some implementations, a system, apparatus, process, and/or product may include the ability to calculate at least one SHA-2 sigma function by using an execution unit adapted for performing a processor instruction, the execution unit including an integrated circuit primarily designed for calculating the SHA-2 sigma function(s), and calculating the SHA-2 hash function with general-purpose hardware processing components of the processor based on the sigma function(s). In certain implementations, the calculation of the SHA-2 sigma function(s) can be performed by the integrated circuit within a single instruction, allowing for a faster calculation of the SHA-2 hash function.

    摘要翻译: 可以使用各种系统,装置,处理和/或产品来计算通用处理器中的SHA-2哈希函数。 在一些实现中,系统,装置,过程和/或产品可以包括通过使用适于执行处理器指令的执行单元来计算至少一个SHA-2σ功能的能力,所述执行单元包括主要设计的集成电路 用于计算SHA-2西格玛函数,以及基于σ函数计算具有处理器的通用硬件处理组件的SHA-2哈希函数。 在某些实现中,SHA-2西格玛函数的计算可以由单个指令中的集成电路执行,从而可以更快地计算SHA-2哈希函数。

    SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA
    7.
    发明申请
    SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA 有权
    用于矢量数据的可分离和可扩展的标准化

    公开(公告)号:US20150067298A1

    公开(公告)日:2015-03-05

    申请号:US14016607

    申请日:2013-09-03

    IPC分类号: G06F15/78

    摘要: A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.

    摘要翻译: 配置为支持标量数据路径中的向量操作的硬件电路组件。 硬件电路组件被配置为以矢量模式配置和标量模式配置操作。 硬件电路组件被配置为将标量模式配置分解为向量模式配置的左半部分和右半部分。 硬件电路组件被配置为在矢量模式配置中的一个或多个互连多路复用器级上执行一个或多个位移位。 硬件电路组件被配置为在位向量模式配置的左半部分和右半部分接收数据的比特位置包括复制的粗略移位复用器,从而产生一个或多个共享比特位置的粗移位复用器。

    REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS
    8.
    发明申请
    REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS 有权
    通过在半导体SIMD执行单元中反转加工订单减少发行问题的延迟

    公开(公告)号:US20130159666A1

    公开(公告)日:2013-06-20

    申请号:US13326249

    申请日:2011-12-14

    摘要: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.

    摘要翻译: 描述了通过反转半抽头单指令多数据(SIMD)执行单元中的处理顺序来减少发出问题的延迟的技术。 在一个实施例中,提供了一种处理器功能单元,其包括前端单元和执行核心单元,后端单元,执行顺序控制信号单元,耦合在其中并且输出之间的第一互连和执行核心单元的输入以及耦合的第二互连 在后端单元的输出和前端单元的输入之间。 在操作中,执行顺序控制信号单元在接收到第一向量指令时基于所施加的时钟信号的奇偶校验产生转发顺序控制信号。 该控制信号又用于经由互连选择性地转发第一向量指令的执行结果的第一和第二部分,以用于依赖的第二向量指令的执行。

    METHOD AND DATA PROCESSING UNIT FOR CALCULATING AT LEAST ONE MULTIPLY-SUM OF TWO CARRY-LESS MULTIPLICATIONS OF TWO INPUT OPERANDS, DATA PROCESSING PROGRAM AND COMPUTER PROGRAM PRODUCT
    9.
    发明申请
    METHOD AND DATA PROCESSING UNIT FOR CALCULATING AT LEAST ONE MULTIPLY-SUM OF TWO CARRY-LESS MULTIPLICATIONS OF TWO INPUT OPERANDS, DATA PROCESSING PROGRAM AND COMPUTER PROGRAM PRODUCT 有权
    方法和数据处理单元,用于计算两个输入运算的两次无关多项式的多项式,数据处理程序和计算机程序产品

    公开(公告)号:US20120150933A1

    公开(公告)日:2012-06-14

    申请号:US13183639

    申请日:2011-07-15

    IPC分类号: G06F7/52 G06F7/50

    摘要: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.

    摘要翻译: 可以使用各种系统,装置,处理和程序来计算两个输入操作数的两次无进位乘法的乘法和。 在特定实施方案中,系统,装置,过程和程序可以包括使用用于输入操作数的输入数据总线和用于总体计算结果的输出数据总线的能力,每个总线包括2n位的宽度,其中n是 大于1的整数。 系统,装置,过程和程序还可以计算用于较低级别的分层结构的两个输入操作数的无进位乘法,并且计算用于更高级别的至少一个乘法和至少一个中间乘数和 基于较低级别的无进位乘法的结构级别。 可以根据所使用的全部宽度的输出数据总线,根据操作模式,输出一定数量的乘数作为总计算结果。

    Method and data processing unit for calculating at least one multiply-sum of two carry-less multiplications of two input operands, data processing program and computer program product
    10.
    发明授权
    Method and data processing unit for calculating at least one multiply-sum of two carry-less multiplications of two input operands, data processing program and computer program product 有权
    用于计算两个输入操作数,数据处理程序和计算机程序产品的两个无进位乘法的至少一个乘法和方法和数据处理单元

    公开(公告)号:US08903882B2

    公开(公告)日:2014-12-02

    申请号:US13183639

    申请日:2011-07-15

    IPC分类号: G06F7/38 G06F7/53

    摘要: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.

    摘要翻译: 可以使用各种系统,装置,处理和程序来计算两个输入操作数的两次无进位乘法的乘法和。 在特定实施方案中,系统,装置,过程和程序可以包括使用用于输入操作数的输入数据总线和用于总体计算结果的输出数据总线的能力,每个总线包括2n位的宽度,其中n是 大于1的整数。 系统,装置,过程和程序还可以计算用于较低级别的分层结构的两个输入操作数的无进位乘法,并且计算用于更高级别的至少一个乘法和至少一个中间乘数和 基于较低级别的无进位乘法的结构级别。 可以根据所使用的全部宽度的输出数据总线,根据操作模式,输出一定数量的乘数作为总计算结果。