SPIN: a sequential pipeline neurocomputer
    71.
    发明授权
    SPIN: a sequential pipeline neurocomputer 失效
    旋转:顺序管道神经计算机

    公开(公告)号:US5337395A

    公开(公告)日:1994-08-09

    申请号:US681842

    申请日:1991-04-08

    IPC分类号: G06N3/04 G06F15/18

    CPC分类号: G06N3/04

    摘要: A neural network architecture consisting of input weight multiplications, product summation, neural state calculations, and complete connectivity among the neuron processing elements. Neural networks are modelled using a sequential pipelined neurocomputer producing high performance with minimum hardware by sequentially processing each neuron in the completely connected network model. An N neuron network is implemented using multipliers, a pipelined adder tree structure, and activation functions. The activation functions are provided by using one activation function module and sequentially passing the N input product summations sequentially through it. One bus provides N.times.N communications by sequentially providing N neuron values to the multiplier registers. The neuron values are ensured of reaching corresponding multipliers through a tag compare function. The neuron information includes a source tag and a valid signal. Higher performance is provided by connecting a number of the neurocomputers in a parallel.

    摘要翻译: 神经网络结构由输入权重乘法,乘积求和,神经状态计算以及神经元处理元件之间的完全连接组成。 神经网络使用顺序流水线神经计算机进行建模,通过在完全连接的网络模型中顺序处理每个神经元,以最小的硬件产生高性能。 使用乘法器,流水线加法器树结构和激活函数来实现N个神经元网络。 通过使用一个激活功能模块并依次通过N个输入产品求和来提供激活功能。 一个总线通过向乘法器寄存器顺序提供N个神经元值来提供NxN通信。 确保通过标签比较功能达到相应乘法器的神经元值。 神经元信息包括源标签和有效信号。 通过并行连接多个神经计算机来提供更高的性能。

    Virtual neurocomputer architectures for neural networks
    72.
    发明授权
    Virtual neurocomputer architectures for neural networks 失效
    用于神经网络的虚拟神经计算机架构

    公开(公告)号:US5243688A

    公开(公告)日:1993-09-07

    申请号:US702260

    申请日:1991-05-17

    IPC分类号: G06N3/063 G06N3/10

    CPC分类号: G06N3/10 G06N3/063

    摘要: The architectures for a scalable neural processor (SNAP) and a Triangular Scalable Neural Array Processor (T-SNAP) are expanded to handle network simulations where the number of neurons to be modeled exceeds the number of physical neurons implemented. This virtual neural processing is described for three general virtual architectural approaches for handling the virtual neurons, one for SNAP and one for TSNAP, and a third approach applied to both SNAP and TSNAP.

    摘要翻译: 用于可扩展神经处理器(SNAP)和三角形可扩展神经阵列处理器(T-SNAP)的架构被扩展以处理网络模拟,其中要建模的神经元的数量超过实现的物理神经元的数量。 描述了用于处理虚拟神经元的三种一般虚拟架构方法的这种虚拟神经处理,一种用于SNAP,另一种用于TSNAP,以及应用于SNAP和TSNAP的第三种方法。

    Scalable neural array processor and method
    73.
    发明授权
    Scalable neural array processor and method 失效
    可扩展神经阵列处理器和方法

    公开(公告)号:US5148515A

    公开(公告)日:1992-09-15

    申请号:US740266

    申请日:1991-08-05

    IPC分类号: G06N3/063 G06N3/10

    CPC分类号: G06N3/063 G06N3/10

    摘要: An Array Processor and Method for a Scalable Array Neural Processor (SNAP) permits computing as a dynamic and highly parallel computationally intensive system typically consisting of input weight multiplications, product summation, neural state calculations, and complete connectivity among the neurons. The Scalable Neural Array Processor (SNAP) uses a unique intercommunication scheme within an array structure that provides high performance for completely connected network models such as the Hopfield model. SNAP's packaging and expansion capabilities are addressed, demonstrating SNAP's scalability to larger networks. The array processor is scalable. It has an array of function elements and a plurality of orthogonal horizontal and vertical processing elements for communication, computation and reduction. This structure permits in a first computation state the generation of a set of output values and in the first communication state the processing elements produce, responsive to the output values, first reduction values. In a second computation state processing elements, responsive to the first reduction values, generate vertical output values, and in a second computation state the vertical output values are communicated back to the inputs of the function elements. Responsive to a third computation state responsive to the vertical output values, a second set of output values is generated by said function elements, and in a third communication state the horizontal processing elements produce second reduction values. In a fourth computation state the horizontal processing elements generate horizontal output values, and responsive to a fourth communication state the horizontal processing elements communicate the horizontal output values back to the inputs of the function elements.

    摘要翻译: 用于可扩展阵列神经处理器(SNAP)的阵列处理器和方法允许计算作为动态和高度并行的计算密集型系统,通常由输入权重乘法,乘积求和,神经状态计算和神经元之间的完全连接组成。 可扩展神经阵列处理器(SNAP)在阵列结构中使用独特的互通方案,为完全连接的网络模型(如Hopfield模型)提供高性能。 SNAP的封装和扩展功能得到了解决,展示了SNAP对较大网络的可扩展性。 阵列处理器是可扩展的。 它具有功能元件阵列和用于通信,计算和减少的多个正交水平和垂直处理元件。 该结构允许在第一计算状态下产生一组输出值,并且在第一通信状态中,处理元件响应于输出值产生第一减少值。 在第二计算状态处理元件中,响应于第一减少值,产生垂直输出值,并且在第二计算状态下,将垂直输出值传送回功能元件的输入。 响应于垂直输出值的第三计算状态,由所述功能元件生成第二组输出值,并且在第三通信状态中,水平处理元件产生第二减少值。 在第四计算状态下,水平处理元件产生水平输出值,并且响应于第四通信状态,水平处理元件将水平输出值传送回功能元件的输入。

    Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response
    75.
    发明申请
    Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response 失效
    用于可扩展阵列处理器中断检测和响应的方法和装置

    公开(公告)号:US20120173849A1

    公开(公告)日:2012-07-05

    申请号:US13417490

    申请日:2012-03-12

    IPC分类号: G06F9/38 G06F9/312

    摘要: Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is to required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism.

    摘要翻译: 描述了可扩展流水线阵列处理器环境中的中断检测和响应的硬件和软件技术。 利用这些技术,可以在包含多个处理元件和分布式存储器和寄存器文件的高度并行的可扩展流水线阵列处理中维持具有中断的顺序程序执行模型。 当发生中断时,接口信号提供给所有PE,以支持每个PE中的独立中断操作,取决于中断前的本地PE指令序列。 支持处理/元素异常中断,并为需要实时信号处理的嵌入式系统提供低延迟中断处理。 此外,使用分层中断结构,允许使用初次中断的通用调试方法和动态登场监视机制。

    Manifold array processor
    76.
    发明授权
    Manifold array processor 失效
    歧管阵列处理器

    公开(公告)号:US06892291B2

    公开(公告)日:2005-05-10

    申请号:US10036789

    申请日:2001-12-21

    摘要: An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays. Rather, the longest communications path is limited only by the inter-cluster spacing. In one implementation, transpose elements of an N×N torus are combined in clusters and communicate with one another through intra-cluster communications paths. Since transpose elements have direct connections to one another, transpose operation latency is eliminated in this approach. Additionally, each PE may have a single transmit port and a single receive port. As a result, the individual PEs are decoupled from the topology of the array.

    摘要翻译: 阵列处理器包括按簇排列的处理元件,它们依次以矩形阵列组合。 每个簇由优选地与至少两个其他簇的处理元件通信的处理元件形成。 另外每个集群间的通信路径是相互排斥的,也就是说,每条路径都有北西,南,东,北,东,或南,西通信。 由于数据路径的相互独占性,每个集群的处理元件之间的通信可以组合在单个集群间路径中。 也就是说,来自与北部和东部与另一个群集通信的群集的通信可以组合在一个路径中,从而消除路径所需的一半布线。 此外,最长通信路径的长度不是直接由阵列的整体尺寸决定,就像在传统的环面阵列中一样。 相反,最长的通信路径仅受群间间隔限制。 在一个实现中,将NxN环面的转置元素组合在一起并通过集群内通信路径相互通信。 由于转置元素具有彼此的直接连接,因此在此方法中消除了转置操作延迟。 另外,每个PE可以具有单个发送端口和单个接收端口。 因此,各个PE与阵列的拓扑结构分离。

    Methods and apparatus for loading a very long instruction word memory
    77.
    发明授权
    Methods and apparatus for loading a very long instruction word memory 失效
    用于加载非常长的指令字存储器的方法和装置

    公开(公告)号:US06883088B1

    公开(公告)日:2005-04-19

    申请号:US10758348

    申请日:2004-01-15

    摘要: The ManArray processor is a scalable indirect VLIW array processor that defines two preferred architectures for indirect VLIW memories. One approach treats the VIM as one composite block of memory using one common address interface to access any VLIW stored in the VIM. The second approach treats the VIM as made up of multiple smaller VIMs each individually associated with the functional units and each individually addressable for loading and reading during XV execution. The VIM memories, contained in each processing element (PE), are accessible by the same type of LV and XV Short Instruction Words (SIWs) as in a single processor instantiation of the indirect VLIW architecture. In the ManArray architecture, the control processor, also called a sequence processor (SP), fetches the instructions from the SIW memory and dispatches them to itself and the PEs. By using the LV instruction, VLIWs can be loaded into VIMs in the SP and the PEs. Since the LV instruction is supplied by the SP through the instruction stream, when VLIWs are being loaded into any VIM no other processing takes place. In addition, as defined in the ManArray architecture, when the SP is processing SIWs, such as control and other sequential code, the PE array is not executing any instructions. Techniques are provided herein to independently load the VIMs concurrent with SIW or iVLIW execution on the SP or on the PEs thereby allowing the load latency to be hidden by the computation.

    摘要翻译: ManArray处理器是可扩展的间接VLIW阵列处理器,它定义了间接VLIW存储器的两种优选架构。 一种方法将VIM视为一个复合的存储器块,使用一个公共地址接口访问存储在VIM中的任何VLIW。 第二种方法将VIM视为由功能单元单独关联的多个较小的VIM组成,并且每个VIM单独可寻址以在XV执行期间进行加载和读取。 包含在每个处理元件(PE)中的VIM存储器可以通过与间接VLIW架构的单处理器实例化中相同类型的LV和XV短指令字(SIW)来访问。 在ManArray架构中,控制处理器(也称为序列处理器(SP))从SIW存储器中获取指令,并将它们分派给自身和PE。 通过使用LV指令,VLIW可以加载到SP和PE中的VIM中。 由于LV指令由SP通过指令流提供,当VLIW被加载到任何VIM中时,不会发生其他处理。 另外,如ManArray架构所定义的,当SP正在处理SIW(例如控制和其他顺序代码)时,PE阵列不执行任何指令。 本文提供了技术来独立地在SP或PE上独立地加载与SIW或iVLIW执行的VIM,从而允许通过计算隐藏负载等待时间。

    Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor
    78.
    发明授权
    Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor 失效
    用于动态超长指令字子指令选择的方法和装置,用于间接非常长的指令字处理器中的执行时间并行性

    公开(公告)号:US06851041B2

    公开(公告)日:2005-02-01

    申请号:US10254012

    申请日:2002-09-24

    摘要: A pipelined data processing unit includes an instruction sequencer and n functional units capable of executing n operations in parallel. The instruction sequencer includes a random access memory for storing very-long-instruction-words (VLIWs) used in operations involving the execution of two or more functional units in parallel. Each VLIW comprises a plurality of short-instruction-words (SIWs) where each SIW corresponds to a unique type of instruction associated with a unique functional unit. VLIWs are composed in the VLIW memory by loading and concatenating SIWs in each address, or entry. VLIWs are executed via the execute-VLIW (XV) instruction. The iVLIWs can be compressed at a VLIW memory address by use of a mask field contained within the XV1 instruction which specifies which functional units are enabled, or disabled, during the execution of the VLIW. The mask can be changed each time the XV1 instruction is executed, effectively modifying the VLIW every time it is executed. The VLIW memory (VIM) can be further partitioned into separate memories each associated with a function decode-and-execute unit. With a second execute VLIW instruction XV2, each functional unit's VIM can be independently addressed thereby removing duplicate SIWs within the functional unit's VIM. This provides a further optimization of the VLIW storage thereby allowing the use of smaller VLIW memories in cost sensitive applications.

    摘要翻译: 流水线数据处理单元包括指令定序器和能够并行执行n个操作的n个功能单元。 指令定序器包括用于存储在涉及并行执行两个或多个功能单元的操作中使用的非常长指令字(VLIW)的随机存取存储器。 每个VLIW包括多个短指令字(SIW),其中每个SIW对应于与唯一功能单元相关联的唯一类型的指令。 VLIW通过在每个地址或条目中加载和连接SIW来组成VLIW存储器。 VLIW通过执行VLIW(XV)指令执行。 通过使用XV1指令中包含的掩码字段,可以在VLIW存储器地址处压缩iVLIW,该掩码字段指定在执行VLIW期间启用或禁用哪些功能单元。 每次执行XV1指令时,可以更改掩码,每次执行时都可以有效地修改VLIW。 VLIW存储器(VIM)可以被进一步划分成各自与功能解码和执行单元相关联的存储器。 通过第二执行VLIW指令XV2,可以独立地对每个功能单元的VIM进行寻址,从而去除功能单元的VIM内的重复SIW。 这提供了VLIW存储器的进一步优化,从而允许在成本敏感的应用中使用较小的VLIW存储器。

    Merged control/process element processor for executing VLIW simplex instructions with SISD control/SIMD process mode bit
    79.
    发明授权
    Merged control/process element processor for executing VLIW simplex instructions with SISD control/SIMD process mode bit 有权
    用于使用SISD控制/ SIMD过程模式位执行VLIW单工指令的合并控制/处理元件处理器

    公开(公告)号:US06606699B2

    公开(公告)日:2003-08-12

    申请号:US09783156

    申请日:2001-02-14

    IPC分类号: G06F1580

    摘要: An apparatus for concurrently executing controller single instruction single data (SISD) instructions and single instruction multiple data (SIMD) processing element instructions comprising a combined controller and processing element. At least first and second simplex instructions each comprise a mode of operation bit, said mode of operation bit in the first simplex instruction specifying a controller SISD operation for execution by the controller, and the mode of operation bit in the second simplex instruction specifying a procesing element SIMD operation for execution by the processsing element. A very long instruction word (VLIW) contains said at least first and second simplex instructions.

    摘要翻译: 一种用于同时执行控制器单指令单数据(SISD)指令和包括组合控制器和处理元件的单指令多数据(SIMD)处理元件指令)的装置。 至少第一和第二单工指令各自包括操作模式位,所述第一单工指令中的所述操作模式位指定控制器执行的控制器SISD操作,以及指定处理的第二单工指令中的操作模式位 元素SIMD操作由处理元素执行。 非常长的指令字(VLIW)包含所述至少第一和第二单工指令。