Hyperprocessor
    1.
    发明授权
    Hyperprocessor 有权
    超处理器

    公开(公告)号:US07533382B2

    公开(公告)日:2009-05-12

    申请号:US10283653

    申请日:2002-10-30

    申请人: Faraydon O. Karim

    发明人: Faraydon O. Karim

    IPC分类号: G06F9/46 G06F15/00

    摘要: A hyperprocessor includes a control processor controlling tasks executed by a plurality of processor cores, each of which may include multiple execution units, or special hardware units. The control processor schedules tasks according to control threads for the tasks created during compilation and comprising a hardware context including register files, a program counter and status bits for the respective task. The tasks are dispatched to the processor cores or special hardware units for parallel, sequential, out-of-order or speculative execution. A universal register file contains data to be operated on by the task, and an interconnect couples at least the processor cores or special hardware units to each other and to the universal register file, allowing each node to communicate with any other node.

    摘要翻译: 超处理器包括控制处理器,其控制由多个处理器核执行的任务,每个处理器核可以包括多个执行单元或特殊硬件单元。 控制处理器根据编译期间创建的任务的控制线程调度任务,并且包括包括寄存器文件的硬件上下文,程序计数器和相应任务的状态位。 将任务分派到处理器内核或特殊硬件单元进行并行,顺序,无序或推测执行。 通用寄存器文件包含要由任务操作的数据,并且互连至少将处理器核心或特殊硬件单元彼此耦合到通用寄存器文件,从而允许每个节点与任何其他节点通信。

    Method and device for computing incremental checksums

    公开(公告)号:US06643821B2

    公开(公告)日:2003-11-04

    申请号:US09726927

    申请日:2000-11-30

    IPC分类号: G06F1100

    CPC分类号: H03M13/096

    摘要: A method and a computing system compute an incremental checksum corresponding to a data packet. The incremental checksum is computed within one processor cycle of a processor. A first register (102) stores first checksum information corresponding to a data packet. A second register (104) stores second checksum information corresponding to old information being deleted from the data packet. A third register (106) stores third checksum information corresponding to new information being added to the data packet. An incremental checksum circuit (100), electrically connected to the first register (102), to the second register (104), and to the third register (106), provides resulting checksum information corresponding to the data packet after deleting the old information from the data packet and adding the new information to the data packet. The resulting checksum information is selectively stored in the first register (102).

    System for multiple error detection with single and double bit error
correction
    3.
    发明授权
    System for multiple error detection with single and double bit error correction 失效
    用于单和双位错误校正的多重错误检测系统

    公开(公告)号:US4589112A

    公开(公告)日:1986-05-13

    申请号:US574221

    申请日:1984-01-26

    申请人: Faraydon O. Karim

    发明人: Faraydon O. Karim

    IPC分类号: G06F11/10 H03M13/00 H03M13/13

    CPC分类号: H03M13/13

    摘要: A system for detecting multiple errors that may occur during transfer of data and for correcting up to two of these errors simultaneously. The system has a component for calculating a number of check bits associated with the data word. Also provided is a component for grouping all data bits into base groups and multiple groups, the sum of the number of base groups and multiple groups being equal to the number of check bits. Up to two weights are assigned for each data bit. The system distributes the data bits among the groups according to the weights assigned thereto. Also provided is a component for generating a check bit for each of the groups and for padding the data word with the check bits to form an appended data word. A generator creates a predetermined number of syndrome bits, the number being the number of check bits. Finally, a decoder is provided for decoding the syndrome bits to identify the erroneous bits in the data word.

    摘要翻译: 一种用于检测数据传输期间可能发生的多个错误并同时纠正这些错误中的两个错误的系统。 该系统具有用于计算与数据字相关联的多个校验位的组件。 还提供了用于将所有数据位分组为基组和多组的组件,基组数和多组的数量等于校验位数。 为每个数据位分配最多两个权重。 系统根据分配给它们的权重来分配组中的数据位。 还提供了用于为每个组生成校验位的组件以及用校验位填充数据字以形成附加的数据字。 发生器产生预定数量的校正位,该数目是校验位数。 最后,提供一个解码器来解码校正子位以识别数据字中的错误位。

    Multiprocessing apparatus, system and method
    4.
    发明申请
    Multiprocessing apparatus, system and method 审中-公开
    多处理装置,系统及方法

    公开(公告)号:US20090133022A1

    公开(公告)日:2009-05-21

    申请号:US11985481

    申请日:2007-11-15

    申请人: Faraydon O. Karim

    发明人: Faraydon O. Karim

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843

    摘要: An apparatus to isolate a main memory in a multiprocessor computer is provided. The apparatus include a master processor and a management device communicating with the master processor. One or more slave processors communicate with the master processor and the management device. A volatile memory also communicates with the management device and the main memory communicating with the volatile memory. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules that allow a reader to quickly ascertain the subject matter of the disclosure contained herein. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

    摘要翻译: 提供了一种用于隔离多处理器计算机中的主存储器的装置。 该装置包括与主处理器通信的主处理器和管理装置。 一个或多个从属处理器与主处理器和管理设备进行通信。 易失性存储器还与管理装置和与易失性存储器通信的主存储器进行通信。 本摘要仅用于遵守允许读者快速确定本文所包含的披露的主题的抽象要求规则。 本摘要以明确的理解提交,不会用于解释或限制权利要求书的范围或含义。

    System independent and scalable packet buffer management architecture for network processors
    5.
    发明授权
    System independent and scalable packet buffer management architecture for network processors 有权
    用于网络处理器的系统独立且可扩展的数据包缓冲管理架构

    公开(公告)号:US07468985B2

    公开(公告)日:2008-12-23

    申请号:US10290766

    申请日:2002-11-08

    IPC分类号: H04L12/28 H04L12/56 G06F9/26

    摘要: A circular buffer storing packets for processing by one or more network processors employs an empty buffer address register identifying where a next received packet should be stored, a next packet address register identifying the next packet to be processed, and a packet-processing address register within each network processor identifying the packet being processed by that network processor. The n-bit addresses to the buffer are mapped or masked from/to the m-bit packet-processing address registers by software, allowing the buffer size to be fully scalable. A dedicated packet retrieval instruction supported by the network processor(s) retrieves a new packet for processing using the next packet address register and copies that into the associated packet-processing address register for use in subsequent accesses. Buffer management is thus independent of the network processor architecture.

    摘要翻译: 存储用于由一个或多个网络处理器处理的分组的循环缓冲器使用空缓冲器地址寄存器来标识下一个接收到的分组应该被存储在哪里,下一个分组地址寄存器标识下一个待处理分组,以及一个分组处理地址寄存器 每个网络处理器识别由该网络处理器正在处理的分组。 缓冲区的n位地址由软件映射或掩蔽到m位数据包处理地址寄存器,从而允许缓冲区大小完全可扩展。 由网络处理器支持的专用分组检索指令使用下一个分组地址寄存器检索新的分组进行处理,并将其复制到相关的分组处理地址寄存器中以用于随后的访问。 因此,缓冲区管理与网络处理器架构无关。

    Fetch branch architecture for reducing branch penalty without branch prediction
    6.
    发明授权
    Fetch branch architecture for reducing branch penalty without branch prediction 有权
    获取分支结构,以减少分支惩罚,无需分支预测

    公开(公告)号:US07010675B2

    公开(公告)日:2006-03-07

    申请号:US09917290

    申请日:2001-07-27

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3804 G06F9/3842

    摘要: In lieu of branch prediction, a merged fetch-branch unit operates in parallel with the decode unit within a processor. Upon detection of a branch instruction within a group of one or more fetched instructions, any instructions preceding the branch are marked regular instructions, the branch instruction is marked as such, and any instructions following branch are marked sequential instructions. Within two cycles, sequential instructions following the last fetched instruction are retrieved and marked, target instructions beginning at the branch target address are retrieved and marked, and the branch is resolved. Either the sequential or target instructions are then dropped depending on the branch resolution, incurring a fixed, 1 cycle branch penalty.

    摘要翻译: 代替分支预测,合并的分支单元与处理器内的解码单元并行操作。 在检测到一个或多个获取的指令的组内的分支指令时,分支之前的任何指令被标记为常规指令,分支指令被标记为这样,并且分支之后的任何指令被标记为顺序指令。 在两个周期内,检索并标记最后取出的指令之后的顺序指令,检索并标记从分支目标地址开始的目标指令,并解析分支。 然后根据分支分辨率,顺序或目标指令被丢弃,产生固定的1个循环分支罚分。

    Method and apparatus for floating point normalization
    7.
    发明授权
    Method and apparatus for floating point normalization 失效
    浮点归一化的方法和装置

    公开(公告)号:US5384723A

    公开(公告)日:1995-01-24

    申请号:US205123

    申请日:1994-02-28

    CPC分类号: G06F5/012 G06F5/015

    摘要: A method and apparatus for performing normalization of floating point numbers using a much smaller width register than would normally be required for the data operands which can be processed. As the registers are smaller, the number of circuits required to achieve the normalization is reduced, resulting in a decrease in the chip area required to perform such operation. The normalization circuitry was streamlined to efficiently operate on the more prevalent type of data being presented to the floating point unit. Data types and/or operations which statistically occur less frequently require multiple cycles of the normalization function. It was found that for the more prevalent data types and/or operations, the width of the registers required was substantially less than the width required for the less frequent data types and/or operations. Instead of expanding the register width to accommodate these lesser occurrences, the data is broken into smaller portions and normalized using successive cycles of the normalization circuitry. Thus, by sacrificing speed for the lesser occurring events, a significant savings was realized in the number of circuits required to implement normalization. As the slower speed operations occur infrequently, the overall performance of the normalization function is minimally impacted. Thus, considerable savings in integrated circuit real estate is achieved with minimal impact to the overall throughput of the system.

    摘要翻译: 一种方法和装置,用于使用比可以处理的数据操作数通常要求的更小的宽度寄存器来执行浮点数的归一化。 由于寄存器较小,实现归一化所需的电路数量减少,导致执行此类操作所需的芯片面积减少。 归一化电路被简化以有效地对呈现给浮点单元的更普遍类型的数据进行操作。 统计上发生较少频率的数据类型和/或操作需要标准化功能的多个周期。 已经发现,对于更普遍的数据类型和/或操作,所需寄存器的宽度远小于较不频繁的数据类型和/或操作所需的宽度。 代替扩展寄存器宽度以适应这些较小的出现,数据被分解成更小的部分,并使用归一化电路的连续周期进行归一化。 因此,通过牺牲较小的事件的速度,实现标准化所需的电路数量实现了显着的节省。 由于较慢的速度操作不频繁发生,所以归一化功能的整体性能受到最小的影响。 因此,实现集成电路空间的可观节省,对系统的整体吞吐量的影响最小。

    Octagonal interconnection network for linking processing nodes on an SOC device and method of operating same
    8.
    发明授权
    Octagonal interconnection network for linking processing nodes on an SOC device and method of operating same 有权
    用于链接SOC设备上的处理节点的八角互连网络及其操作方法

    公开(公告)号:US07218616B2

    公开(公告)日:2007-05-15

    申请号:US10090899

    申请日:2002-03-05

    申请人: Faraydon O. Karim

    发明人: Faraydon O. Karim

    IPC分类号: H04L12/56

    摘要: An octagonal interconnection network for routing data packets. The interconnection network comprises: 1) eight switching circuits for transferring data packets with each other; 2) eight sequential data links bidirectionally coupling the eight switching circuits in sequence to thereby form an octagonal ring configuration; and 3) four crossing data links, wherein a first crossing data link bidirectionally couples a first switching circuit to a fifth switching circuit, a second crossing data link bidirectionally couples a second switching circuit to a sixth switching circuit, a third crossing data link bidirectionally couples a third switching circuit to a seventh switching circuit, and a fourth crossing data link bidirectionally couples a fourth switching circuit to an eighth switching circuit.

    摘要翻译: 用于路由数据包的八角互连网络。 互连网络包括:1)用于彼此传输数据分组的8个切换电路; 2)八个顺序数据链路按顺序耦合八个开关电路从而形成八角形环配置; 以及3)四个交叉数据链路,其中第一交叉数据链路将第一开关电路双向耦合到第五开关电路,第二交叉数据链路将第二开关电路双向耦合到第六开关电路,第三交叉数据链路双向耦合 第七开关电路,第七开关电路和第四交叉数据链路将第四开关电路双向耦合到第八开关电路。

    Method and device for computing the number of bits set to one in an arbitrary length word
    9.
    发明授权
    Method and device for computing the number of bits set to one in an arbitrary length word 有权
    用于计算在任意长度字中设置为1的位数的方法和装置

    公开(公告)号:US06795839B2

    公开(公告)日:2004-09-21

    申请号:US09727135

    申请日:2000-11-30

    IPC分类号: G06F700

    CPC分类号: G06F7/607

    摘要: A method and a bit counting device (100) count bits set to one in a data word of arbitrary size. The bit counting device (100) includes a first data register (110) for storing a data word, an offset register (112) for storing an offset value, a second data register (120), and a one-cycle shifter (114), electrically connected to the first data register (110), to the second data register (120), and to the offset register (112), for shifting the data word by a value stored in the offset register (112) and storing the shifted data word in the second data register (120). The device 100 also includes a third data register (124) and at least one carry save adder (CSA) device (122) organized in a tree structure, and electrically connected to the second data register (120) and to the third data register (124), for counting the number of bits set to one in the data word stored in the second data register (120) and storing in the third data register (124) a value representing the count of bits set to one in the data word.

    摘要翻译: 方法和位计数装置(100)将任意大小的数据字中的一个设置为1的计数位。 位计数装置(100)包括用于存储数据字的第一数据寄存器(110),用于存储偏移值的偏移寄存器(112),第二数据寄存器(120)和单周期移位器(114) ,电连接到第一数据寄存器(110)到第二数据寄存器(120)和偏移寄存器(112),用于将数据字移位存储在偏移寄存器(112)中的值,并存储转移的 数据字在第二数据寄存器(120)中。 装置100还包括以树结构组织的第三数据寄存器(124)和至少一个进位保存加法器(CSA)装置(122),并且电连接到第二数据寄存器(120)和第三数据寄存器 124),用于对存储在第二数据寄存器(120)中的数据字中设置为1的比特数进行计数,并在第三数据寄存器(124)中存储表示数据字中设置为1的比特数的值。

    Floating point arithmetic unit with size efficient pipelined
multiply-add architecture
    10.
    发明授权
    Floating point arithmetic unit with size efficient pipelined multiply-add architecture 失效
    浮动点算术单元,尺寸有效的管道多用途建筑

    公开(公告)号:US5241493A

    公开(公告)日:1993-08-31

    申请号:US807697

    申请日:1991-12-16

    摘要: An architecture and method relating to a floating point operation which performs the mathematical computation of A*B+C. The multiplication is accomplished in two or more stages, each stage involving corresponding sets of partial products and concurrently accomplished incremental summations. A pipelined architecture provides for the summation of the least significant bits of an intermediate product with operand C at a stage preceding entry into a full adder. Thereby, a significant portion of the full adder can be replaced by a simpler and smaller incrementer circuit. Partitioning of the multiplication operation into two or more partial product operations proportionally reduces the size of the multiplier required. Pipelining and concurrence execution of multiplication and addition operation in the multiplier provides in two cycles the results of the mathematical operation A*B+C while using a full adder of three-quarters normal size.