Method and apparatus for providing user-defined interfaces for a configurable processor
    1.
    发明授权
    Method and apparatus for providing user-defined interfaces for a configurable processor 有权
    用于为可配置处理器提供用户定义的接口的方法和装置

    公开(公告)号:US08539399B1

    公开(公告)日:2013-09-17

    申请号:US11829063

    申请日:2007-07-26

    IPC分类号: G06F17/50

    摘要: A technique that improves both processor performance and associated data bandwidth through user-defined interfaces that can be added to a configurable and extensible microprocessor core. These interfaces can be used to communicate status or control information and to achieve synchronization between the processor and any external device including other processors. These interfaces can also be used to achieve data transfer at the rate of one data element per interface in every clock cycle. This technique makes it possible to design multiprocessor SOC systems with high-speed data transfer between processors without using the memory subsystem. Such a system and design methodology offers a complete shift from the standard bus-based architecture and allows designers to treat processors more like true computational units, so that designers can more effectively utilize programmable solutions rather than design dedicated hardware. This can have dramatic effects not only in the performance and bandwidth achieved by designs, but also in the time to market and reuse of such designs.

    摘要翻译: 一种通过可以添加到可配置和可扩展的微处理器内核的用户定义接口来提高处理器性能和相关数据带宽的技术。 这些接口可用于传送状态或控制信息,并实现处理器与包括其他处理器在内的任何外部设备之间的同步。 这些接口也可用于在每个时钟周期以每个接口的一个数据元素的速率实现数据传输。 这种技术使得可以在不使用存储器子系统的情况下,在处理器之间设计具有高速数据传输的多处理器SOC系统。 这种系统和设计方法提供了从基于标准总线的架构的全面转变,并允许设计人员将处理器视为真正的计算单元,从而使设计人员能够更有效地利用可编程解决方案,而不是设计专用硬件。 这不仅可以在设计实现的性能和带宽方面,而且在上市时间和这种设计的再利用方面都会产生戏剧性的影响。

    Method and apparatus for providing user-defined interfaces for a configurable processor
    2.
    发明授权
    Method and apparatus for providing user-defined interfaces for a configurable processor 有权
    用于为可配置处理器提供用户定义的接口的方法和装置

    公开(公告)号:US07664928B1

    公开(公告)日:2010-02-16

    申请号:US11039757

    申请日:2005-01-19

    IPC分类号: G06F15/00

    摘要: A technique that improves both processor performance and associated data bandwidth through user-defined interfaces that can be added to a configurable and extensible microprocessor core. These interfaces can be used to communicate status or control information and to achieve synchronization between the processor and any external device including other processors. These interfaces can also be used to achieve data transfer at the rate of one data element per interface in every clock cycle. This technique makes it possible to design multiprocessor SOC systems with high-speed data transfer between processors without using the memory subsystem. Such a system and design methodology offers a complete shift from the standard bus-based architecture and allows designers to treat processors more like true computational units, so that designers can more effectively utilize programmable solutions rather than design dedicated hardware. This can have dramatic effects not only in the performance and bandwidth achieved by designs, but also in the time to market and reuse of such designs.

    摘要翻译: 一种通过可以添加到可配置和可扩展的微处理器内核的用户定义接口来提高处理器性能和相关数据带宽的技术。 这些接口可用于传送状态或控制信息,并实现处理器与包括其他处理器在内的任何外部设备之间的同步。 这些接口也可用于在每个时钟周期以每个接口的一个数据元素的速率实现数据传输。 这种技术使得可以在不使用存储器子系统的情况下,在处理器之间设计具有高速数据传输的多处理器SOC系统。 这种系统和设计方法提供了从基于标准总线的架构的全面转变,并允许设计人员将处理器视为真正的计算单元,从而使设计人员能够更有效地利用可编程解决方案,而不是设计专用硬件。 这不仅可以在设计实现的性能和带宽方面,而且在上市时间和这种设计的再利用方面都会产生戏剧性的影响。

    Vector co-processor for configurable and extensible processor architecture
    3.
    发明授权
    Vector co-processor for configurable and extensible processor architecture 有权
    用于可配置和可扩展处理器架构的矢量协处理器

    公开(公告)号:US07376812B1

    公开(公告)日:2008-05-20

    申请号:US10145380

    申请日:2002-05-13

    IPC分类号: G06F15/00 G06F15/76

    摘要: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.

    摘要翻译: 处理器可以实现高代码密度,同时允许比现有架构更高的性能,特别是对于数字信号处理(DSP)应用。 根据一个方面,处理器支持三种可能的指令大小,同时保持编程的简单性并且允许有效的物理实现。 大多数应用程序代码可以使用两组窄尺寸指令进行编码,以实现较高的代码密度。 添加第三个(即更大,即VLIW)指令大小允许架构对代码的性能关键部分的每个指令编码多个操作。 此外,VLIW格式指令的每个操作可以可选地是对矢量数据进行操作的SIMD操作。 还提供了用于乘法累加(MAC)硬件的最佳利用(给定量的硬件的最高可实现性能)的方案。

    Method and system for managing memory in a multiprocessor system
    5.
    发明授权
    Method and system for managing memory in a multiprocessor system 有权
    用于管理多处理器系统中的存储器的方法和系统

    公开(公告)号:US07500068B1

    公开(公告)日:2009-03-03

    申请号:US11426538

    申请日:2006-06-26

    CPC分类号: G06F12/0817 G06F12/0813

    摘要: A method and system for managing memory in a multiprocessor system includes defining the plurality of processor coherence domains within a system coherence domain of the multiprocessor system. The processor coherence domains each include a plurality of processors and a processor memory. Shared access to data in the processor memory of each processor coherence domain is provided only to elements of the multiprocessor system within the processor coherence domain. Non-shared access to data in the processor memory of each processor coherence domain is provided to elements of the multiprocessor system within and outside of the processor coherence domain.

    摘要翻译: 用于管理多处理器系统中的存储器的方法和系统包括在多处理器系统的系统相干域内定义多个处理器相干域。 处理器相干域各自包括多个处理器和处理器存储器。 每个处理器一致性域的处理器存储器中的数据的共享访问仅提供给处理器相干域内的多处理器系统的元件。 每个处理器相干域的处理器存储器中的数据的非共享访问被提供给处理器相干域内部和外部的多处理器系统的元件。

    System and method for memory arbitration
    6.
    发明授权
    System and method for memory arbitration 有权
    内存仲裁的系统和方法

    公开(公告)号:US06816947B1

    公开(公告)日:2004-11-09

    申请号:US09909705

    申请日:2001-07-20

    IPC分类号: G06F1200

    摘要: A memory access arbitration scheme is provided where transactions to a Shared memory are stored in an arbitration queue. Prior to arbitration, the transactions are compared against the contents of cache memory, to determine which transactions will hit in cache, which will miss and which will be victims. Also prior to arbitration, the entries in the arbitration queue are grouped according to a transaction parameter, such as DRAM bank, Write to Bank, Read to Bank, etc. Arbitration is the performed among those groups which are ready for service. From the group winning arbitration, the oldest transaction is selected for servicing. Preferably, a collapsible queuing structure and method is used, such that once a transaction is serviced, higher order entries ripple down in the queue to make room for new entries while maintaining an oldest to newest relationship among the queue entries.

    摘要翻译: 提供存储器访问仲裁方案,其中向共享存储器的事务存储在仲裁队列中。 在仲裁之前,将事务与高速缓存的内容进行比较,以确定哪些事务将在高速缓存中发生,哪些将丢失,哪些将成为受害者。 另外在仲裁之前,仲裁队列中的条目根据事务参数进行分组,如DRAM银行,写入银行,读银行等。仲裁是在准备服务的组之间进行的。 从获胜的仲裁中,选择最旧的交易进行维修。 优选地,使用可折叠排队结构和方法,使得一旦事务被服务,高阶条目在队列中下降以便为新条目腾出空间,同时保持队列条目中最旧到最新的关系。

    Method and system for storing data at input/output (I/O) interfaces for a multiprocessor system
    7.
    发明授权
    Method and system for storing data at input/output (I/O) interfaces for a multiprocessor system 有权
    用于在多处理器系统的输入/输出(I / O)接口处存储数据的方法和系统

    公开(公告)号:US06795900B1

    公开(公告)日:2004-09-21

    申请号:US09910363

    申请日:2001-07-20

    IPC分类号: G06F1200

    摘要: A multiprocessor system and method includes a processing sub-system including a plurality of processors in a processor memory system. A network is operable to couple the processing sub-system to an input/output (I/O) sub-system. The I/O sub-system includes a plurality of I/O interfaces each operable to couple a peripheral device to the multiprocessor system. The I/O interfaces each include a local memory operable to store exclusive read-only copies of data from the processor memory system for use by a corresponding peripheral device.

    摘要翻译: 多处理器系统和方法包括在处理器存储器系统中包括多个处理器的处理子系统。 网络可操作以将处理子系统耦合到输入/输出(I / O)子系统。 I / O子系统包括多个I / O接口,每个I / O接口可操作以将外围设备耦合到多处理器系统。 I / O接口各自包括本地存储器,其可操作以存储来自处理器存储器系统的数据的专用只读副本以供对应的外围设备使用。

    Alignment and ordering of vector elements for single instruction
multiple data processing
    8.
    发明授权
    Alignment and ordering of vector elements for single instruction multiple data processing 失效
    用于单指令多数据处理的向量元素的对齐和排序

    公开(公告)号:US5933650A

    公开(公告)日:1999-08-03

    申请号:US947649

    申请日:1997-10-09

    摘要: The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.

    摘要翻译: 本发明提供用于SIMD处理的向量元素的对准和排序。 在用于SIMD处理的向量元素的对齐中,一个向量从存储器单元加载到第一寄存器中,另一个向量从存储器单元加载到第二寄存器中。 第一个向量包含要生成的对齐向量的第一个字节。 然后,确定指定对齐向量的第一个字节的起始字节。 接下来,从第一寄存器提取向量,并且从第一寄存器的第一字节的第一位开始的第二寄存器继续通过第二寄存器中的位。 最后,将所提取的矢量复制到第三寄存器中,使得第三寄存器包含对准用于SIMD处理的多个元素。 在用于SIMD处理的向量元素的排序中,将第一向量从存储器单元加载到第一寄存器中,并且将第二向量从存储器单元加载到第二寄存器中。 然后,从第一寄存器和第二寄存器中选择元件的子集。 然后将来自子集的元素以适合于随后的SIMD向量处理的特定顺序复制到第三寄存器中的元素中。

    Method for preventing multi-level cache system deadlock in a
multi-processor system
    9.
    发明授权
    Method for preventing multi-level cache system deadlock in a multi-processor system 失效
    防止多处理器系统中多级缓存系统死锁的方法

    公开(公告)号:US5632025A

    公开(公告)日:1997-05-20

    申请号:US696788

    申请日:1996-08-14

    IPC分类号: G06F12/08 G06F12/14

    CPC分类号: G06F12/0811

    摘要: A method for preventing deadlock due to the need for data exclusivity when performing forced atomic instructions in a multi-level cache in a multi-processor system. The system determines whether an aligned multi-byte word in which the data of a forced atomic instruction, such as an integer store operation, is exclusive in a first level cache. If so, the forced atomic instruction is allowed to enter a second level cache pipeline. If not, the forced atomic instruction is prevented from entering the second level cache pipeline and a cache miss and fill operation is initiated to cause the aligned word to be exclusive in the first level cache.

    摘要翻译: 一种用于在多处理器系统中的多级缓存中执行强制原子指令时由于需要数据排他性而防止死锁的方法。 系统确定其中诸如整数存储操作的强制原子指令的数据在第一级高速缓存中是否排他的对齐的多字节字。 如果是这样,则强制原子指令被允许进入第二级高速缓存流水线。 如果不是,则强制原子指令被阻止进入第二级高速缓存流水线并且启动高速缓存未命中和填充操作以使对齐的字在第一级高速缓存中是排他的。

    Method and apparatus for upgrading a central processing unit and
existing memory structure in a computer system
    10.
    发明授权
    Method and apparatus for upgrading a central processing unit and existing memory structure in a computer system 失效
    用于升级计算机系统中的中央处理单元和现有存储器结构的方法和装置

    公开(公告)号:US5586270A

    公开(公告)日:1996-12-17

    申请号:US129686

    申请日:1993-09-30

    IPC分类号: G06F13/40 G06F12/08 G06F13/00

    CPC分类号: G06F13/4068

    摘要: The computer system having a first circuit board with a processor for processing information and a slot for receiving an IC card. The slot includes multiple pins for connection to the IC card. The IC card includes a second processor coupled to a second circuit board, where the processor is contained within outer framing structure. An interface coupled to the circuit board may be coupled to the multiple pins in the slot, such that the second processor in the integrated circuit card is able to control the computer system.

    摘要翻译: 该计算机系统具有具有用于处理信息的处理器的第一电路板和用于接收IC卡的插槽。 该插槽包括多个引脚用于连接到IC卡。 IC卡包括耦合到第二电路板的第二处理器,其中处理器被包含在外框架结构内。 耦合到电路板的接口可以耦合到插槽中的多个引脚,使得集成电路卡中的第二处理器能够控制计算机系统。