Systems and methods for transferring data to maintain preferred slot positions in a bi-endian processor
    1.
    发明授权
    Systems and methods for transferring data to maintain preferred slot positions in a bi-endian processor 有权
    用于传送数据以维持双端处理器中优选插槽位置的系统和方法

    公开(公告)号:US08145804B2

    公开(公告)日:2012-03-27

    申请号:US12563756

    申请日:2009-09-21

    IPC分类号: G06F13/28

    CPC分类号: G06F9/30007 G06F9/3824

    摘要: A bi-endian multiprocessor system having multiple processing elements, each of which includes a processor core, a local memory and a memory flow controller. The memory flow controller transfers data between the local memory and data sources external to the processing element. If the processing element and the data source implement data representations having the same endian-ness, each multi-word line of data is stored in the local memory in the same word order as in the data source. If the processing element and the data source implement data representations having different endian-ness, the words of each multi-word line of data are transposed when data is transferred between local memory and the data source. The processing element may incorporate circuitry to add doublewords, wherein the circuitry can alternately carry bits from a first word to a second word or vice versa, depending upon whether the words in lines of data are transposed.

    摘要翻译: 一种具有多个处理元件的双端式多处理器系统,每个处理单元包括处理器核心,本地存储器和存储器流控制器。 存储器流控制器在本地存储器和处理元件外部的数据源之间传送数据。 如果处理元件和数据源实现具有相同字节数的数据表示,则每个多字数据行以与数据源中相同的字顺序存储在本地存储器中。 如果处理元件和数据源实现具有不同端点的数据表示,则当数据在本地存储器和数据源之间传送时,每个多字数据行的字被转置。 处理元件可以包括用于添加双字的电路,其中,根据数据行中的字是否被转置,电路可以交替地将位从第一个字运送到第二个字,反之亦然。

    Systems and Methods for Transferring Data to Maintain Preferred Slot Positions in a Bi-endian Processor
    2.
    发明申请
    Systems and Methods for Transferring Data to Maintain Preferred Slot Positions in a Bi-endian Processor 有权
    用于传输数据以保持双端处理器中优选插槽位置的系统和方法

    公开(公告)号:US20110072170A1

    公开(公告)日:2011-03-24

    申请号:US12563756

    申请日:2009-09-21

    IPC分类号: G06F13/28 G06F12/00 G06F12/08

    CPC分类号: G06F9/30007 G06F9/3824

    摘要: A bi-endian multiprocessor system having multiple processing elements, each of which includes a processor core, a local memory and a memory flow controller. The memory flow controller transfers data between the local memory and data sources external to the processing element. If the processing element and the data source implement data representations having the same endian-ness, each multi-word line of data is stored in the local memory in the same word order as in the data source. If the processing element and the data source implement data representations having different endian-ness, the words of each multi-word line of data are transposed when data is transferred between local memory and the data source. The processing element may incorporate circuitry to add doublewords, wherein the circuitry can alternately carry bits from a first word to a second word or vice versa, depending upon whether the words in lines of data are transposed.

    摘要翻译: 一种具有多个处理元件的双端式多处理器系统,每个处理单元包括处理器核心,本地存储器和存储器流控制器。 存储器流控制器在本地存储器和处理元件外部的数据源之间传送数据。 如果处理元件和数据源实现具有相同字节数的数据表示,则每个多字数据行以与数据源中相同的字顺序存储在本地存储器中。 如果处理元件和数据源实现具有不同端点的数据表示,则当数据在本地存储器和数据源之间传送时,每个多字数据行的字被转置。 处理元件可以包括用于添加双字的电路,其中,根据数据行中的字是否被转置,电路可以交替地将位从第一个字运送到第二个字,反之亦然。

    Circuit design optimization of integrated circuit based clock gated memory elements
    3.
    发明授权
    Circuit design optimization of integrated circuit based clock gated memory elements 有权
    基于集成电路的时钟门控存储器元件的电路设计优化

    公开(公告)号:US07676778B2

    公开(公告)日:2010-03-09

    申请号:US11773412

    申请日:2007-07-04

    IPC分类号: G06F17/50

    CPC分类号: G06F17/505 G06F2217/62

    摘要: A novel method for optimizing the design of digital circuits containing clock gated memory elements. The method unclock gates memory elements by adding necessary feedback loops. Logic functions of memory element outputs in the circuit are viewed as a whole, rather than as separate functions for each input. Detection of duplicate unclock gated memory elements is then effected by identifying identical canonical representations of said unclock gated memory elements. Identified duplicate clock gated memory elements can then be eliminated from the original digital circuit. Further optimization can be accomplished by applying standard logic optimization algorithms to all unclock gated memory elements in said digital circuit. The resulting optimized circuit is clock gated and replaces the original clock gated circuit in said digital circuit.

    摘要翻译: 一种用于优化包含时钟门控存储器元件的数字电路设计的新颖方法。 该方法通过添加必要的反馈环来解锁门存储器元件。 电路中存储元件输出的逻辑功能作为一个整体来看待,而不是作为每个输入的独立功能。 然后通过识别所述非锁定门控存储器元件的相同规范表示来检测重复的非锁定门控存储器元件。 然后可以从原始数字电路中消除识别的重复时钟门控存储器元件。 可以通过将标准逻辑优化算法应用于所述数字电路中的所有非锁定门控存储器元件来实现进一步优化。 所得到的优化电路是时钟门控,并替代所述数字电路中的原始时钟门控电路。

    Formally deriving a minimal clock-gating scheme
    4.
    发明授权
    Formally deriving a minimal clock-gating scheme 有权
    正式推出最小的时钟门控方案

    公开(公告)号:US07849428B2

    公开(公告)日:2010-12-07

    申请号:US12107940

    申请日:2008-04-23

    IPC分类号: G06F17/50

    摘要: The present invention provides a fully automatic method for obtaining a circuit having minimized power consumption due to clock-gating. A circuit design to be optimized is modified to a reduced power modified design and associated with a clock gating scheme. Verification tools compare the modified design with the original design to a predetermined trigger-events to determine if the modified design can be used. Further modifications may be made iteratively until an optimal design is achieved.

    摘要翻译: 本发明提供了一种全自动的方法,用于获得由于时钟选通而具有最小功耗的电路。 要优化的电路设计被修改为减少功率修改的设计并且与时钟门控方案相关联。 验证工具将修改后的设计与原始设计进行比较,以确定是否可以使用修改后的设计。 可以重复进行进一步的修改,直到实现最佳设计。

    MULTI-CYCLE REGISTER FILE BYPASS
    5.
    发明申请
    MULTI-CYCLE REGISTER FILE BYPASS 审中-公开
    多周期寄存器文件旁路

    公开(公告)号:US20090249035A1

    公开(公告)日:2009-10-01

    申请号:US12058043

    申请日:2008-03-28

    IPC分类号: G06F9/30

    摘要: A method of reducing latency in instruction processing in a system, includes calculating a result of a first execution unit, storing the result of the first execution unit in a register file, forwarding the result of the first execution unit, through the bypass unit, to a second execution unit, the second execution unit conducting an instruction dependent on the result, forwarding the result of the first execution unit, from the bypass unit, to a third execution unit, without accessing the register file, the third execution unit conducting an instruction dependent on the result, wherein the execution units can extract the result of the first execution unit through the bypass unit until the new result is calculated, wherein after the new result is calculated, the execution units can access the result of the first execution unit through the register file.

    摘要翻译: 一种减少系统中指令处理的延迟的方法,包括:计算第一执行单元的结果,将第一执行单元的结果存储在寄存器文件中,并将第一执行单元的结果通过旁路单元转发到 第二执行单元,所述第二执行单元执行取决于所述结果的指令,将所述第一执行单元的结果从所述旁路单元转发到第三执行单元,而不访问所述寄存器文件,所述第三执行单元执行指令 取决于结果,其中执行单元可以通过旁路单元提取第一执行单元的结果,直到计算新结果,其中在计算新结果之后,执行单元可以通过以下方式访问第一执行单元的结果: 注册文件。

    Instruction set architecture with instruction characteristic bit indicating a result is not of architectural importance
    7.
    发明授权
    Instruction set architecture with instruction characteristic bit indicating a result is not of architectural importance 失效
    具有指示结果的指令特征位的指令集架构不具有架构重要性

    公开(公告)号:US08266411B2

    公开(公告)日:2012-09-11

    申请号:US12366169

    申请日:2009-02-05

    IPC分类号: G06F9/30

    摘要: Instead of having a processor with an instruction set architecture (ISA) that includes fixed architected operands, an improved processor supports additional characteristic bits for computing instructions (e.g., a multiply-add, load/store instructions). Such additional bits for the certain instructions influence the processing of these instructions by the processor. Also, a new instruction is introduced for further usage of the proposed method. Typically these additional characteristic bits as well as the instruction can be automatically generated by compilers to provide relatively well-suited instruction sequences for the processor.

    摘要翻译: 改进的处理器代替具有包括固定架构操作数的指令集体系结构(ISA)的处理器,而不是用于计算指令(例如,乘法加载/存储指令)的附加特征位。 这些特定指令的附加位影响处理器对这些指令的处理。 另外,引入了新的指令来进一步使用所提出的方法。 通常,这些附加特征位以及指令可以由编译器自动生成,以为处理器提供相对适合的指令序列。

    Method and Apparatus for Performing Equivalence Checking on Circuit Designs Having Differing Clocking and Latching Schemes
    9.
    发明申请
    Method and Apparatus for Performing Equivalence Checking on Circuit Designs Having Differing Clocking and Latching Schemes 失效
    具有不同时钟和锁存方案的电路设计的等效性检查方法和装置

    公开(公告)号:US20080209287A1

    公开(公告)日:2008-08-28

    申请号:US11679234

    申请日:2007-02-27

    IPC分类号: G01R31/28

    CPC分类号: G06F17/504

    摘要: A method for performing equivalence checking on logic circuit designs is disclosed. Within a composite netlist of an original version and a modified version of a logic circuit design, all level-sensitive sequential elements sensitized by a clock=0 are converted into buffers, and all level-sensitive sequential elements sensitized by a clock=1 are converted into level-sensitive registers. A subset of edge-sensitive sequential elements are selectively transformed into level-sensitive sequential elements by removing edge detection logic from the subset of the edge-sensitive sequential elements. A clock to the resulting sequential elements is then set to a logical “1” to verify the sequential equivalence of the transformed netlist.

    摘要翻译: 公开了一种用于对逻辑电路设计进行等价性检查的方法。 在原始版本的复合网表和逻辑电路设计的修改版本中,由时钟= 0敏感的所有电平敏感顺序元件都转换为缓冲器,并且转换为由时钟= 1敏感的所有电平敏感顺序元件 进入电平敏感寄存器。 通过从边缘敏感顺序元素的子集中去除边缘检测逻辑,边缘敏感顺序元素的子集被选择性地变换成等级敏感的顺序元素。 然后将产生的顺序元素的时钟设置为逻辑“1”,以验证转换的网表的顺序等价。

    Electronic circuit for implementing a permutation operation
    10.
    发明申请
    Electronic circuit for implementing a permutation operation 失效
    用于实现置换操作的电子电路

    公开(公告)号:US20070011220A1

    公开(公告)日:2007-01-11

    申请号:US11390791

    申请日:2006-03-28

    IPC分类号: G06F17/15

    CPC分类号: G06F7/766

    摘要: A crossbar (20) circuit with multiplexer (22A, 22B) circuits implemented in a polygonal form on a chip. The crossbar can be used for implementing a permutation of input bits (24A, 24B) controlled by a bit vector (25). Horizontal and vertical wiring lengths in the crossbar (20) are reduced by stacking the operand latches (24A, 24B, 25) and horizontal or vertical multiplexers (22A, 22B). This implementation decreases the latency of the crossbar and avoids latches to store intermediated results, thus reducing area and power consumption.

    摘要翻译: 具有以多边形形式在芯片上实现的多路复用器(22A,22B)电路的交叉开关(20)电路。 交叉开关可用于实现由位向量(25)控制的输入位(24A,24B)的置换。 通过堆叠操作数锁存器(24A,24B,25)和水平或垂直多路复用器(22A,22B)来减小横杆(20)中的水平和垂直布线长度。 该实现降低了交叉开关的延迟,并避免了锁存器来存储中间结果,从而减少了面积和功耗。