Programmable accelerator for a programmable processor system
    1.
    发明授权
    Programmable accelerator for a programmable processor system 有权
    用于可编程处理器系统的可编程加速器

    公开(公告)号:US06397240B1

    公开(公告)日:2002-05-28

    申请号:US09252500

    申请日:1999-02-18

    IPC分类号: G06F700

    摘要: A programmable multi-mode accelerator is disclosed for use with a programmable processor or microprocessor. The programmable multi-mode accelerator allows a programmable processor to execute specific algorithms, such as certain types of finite impulse response (FIR), correlation and Viterbi computations, that require low-precision operations at an extremely high rate. The accelerator extends the digital signal processor's performance into the required range for low-precision computations. The accelerator can be coupled with the main data path of a programmable processor or microprocessor and can directly read and write to the main register files of the programmable processor. In an illustrative implementation, the accelerator data path accesses its input values (source operands) directly from a main register file of the programmable processor and writes results back into a second main register file. The accelerator allows a plurality of low-precision algorithms requiring primarily addition or multiply-add computations, such as finite impulse response, correlation and Viterbi computations, to utilize the same adder cells. The accelerator includes a multi-mode adder that can be programmatically reconfigured to perform various addition computations. In a first mode, referred to as the “single-add mode,” the adder operates as a 17-input 16-bit adder. The single-add mode can be utilized to perform finite impulse response and correlation computations. The second mode, referred to as the “ACS mode,” can be utilized to perform Viterbi computations. The accelerator has a small instruction set and instruction memory and, once started by the main data path, the accelerator executes its own instruction stream. In addition, the accelerator includes a delay line having delays of z−1 or z−2.

    摘要翻译: 公开了一种与可编程处理器或微处理器一起使用的可编程多模式加速器。 可编程多模式加速器允许可编程处理器执行诸如某些类型的有限脉冲响应(FIR),相关和维特比计算等特定算法,其需要以非常高的速率进行低精度运算。 加速器将数字信号处理器的性能扩展到低精度计算所需的范围。 加速器可以与可编程处理器或微处理器的主数据通路耦合,并且可以直接读写可编程处理器的主寄存器文件。 在说明性实现中,加速器数据路径直接从可编程处理器的主寄存器文件访问其输入值(源操作数),并将结果写回第二主寄存器文件。 加速器允许需要主要的加法或乘法加法的多个低精度算法,例如有限脉冲响应,相关和维特比计算,以利用相同的加法器单元。 加速器包括可以以编程方式重新配置以执行各种加法运算的多模式加法器。 在称为“单加法”的第一模式中,加法器作为17输入16位加法器操作。 单加法可用于执行有限脉冲响应和相关计算。 被称为“ACS模式”的第二模式可用于执行维特比计算。 加速器具有小的指令集和指令存储器,并且一旦由主数据路径启动,加速器执行其自身的指令流。 此外,加速器包括具有z-1或z-2的延迟的延迟线。

    Method and apparatus for cache space allocation
    2.
    发明授权
    Method and apparatus for cache space allocation 有权
    缓存空间分配的方法和装置

    公开(公告)号:US06874057B2

    公开(公告)日:2005-03-29

    申请号:US09975763

    申请日:2001-10-09

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0842

    摘要: A method and apparatus are disclosed for allocating a section of a cache memory to one or more tasks. A set index value that identifies a corresponding set in the cache memory is transformed to a mapped set index value that constrains a given task to the corresponding allocated section of the cache. The allocated cache section of the cache can be varied by selecting an appropriate map function. When the map function is embodied as a logical and function, for example, individual sets can be included in an allocated section, for example, by setting a corresponding bit value to binary value of one. A cache addressing scheme is also disclosed that permits a desired portion of a cache to be selectively allocated to one or more tasks. A desired location and size of the allocated section of sets of the cache memory may be specified.

    摘要翻译: 公开了一种用于将高速缓冲存储器的一部分分配给一个或多个任务的方法和装置。 识别高速缓冲存储器中的对应集合的集合索引值被转换为将给定任务限制到高速缓存的对应分配部分的映射集索引值。 可以通过选择适当的地图函数来改变高速缓存的分配的高速缓存部分。 当映射功能被实现为逻辑和功能时,例如,可以将个别集合包括在分配的部分中,例如通过将对应的位值设置为二进制值1。 还公开了缓存寻址方案,其允许将高速缓存的期望部分选择性地分配给一个或多个任务。 可以指定高速缓冲存储器的所设置的段的所需位置和大小。

    Method and apparatus for reducing cache thrashing
    3.
    发明授权
    Method and apparatus for reducing cache thrashing 有权
    减少缓存颠簸的方法和装置

    公开(公告)号:US06874056B2

    公开(公告)日:2005-03-29

    申请号:US09975762

    申请日:2001-10-09

    IPC分类号: G06F12/12 G06F12/00

    CPC分类号: G06F12/121 G06F12/12

    摘要: A method and apparatus are disclosed for adaptively decreasing cache trashing in a cache memory device. Cache performance is improved by automatically detecting thrashing of a set and then providing one or more augmentation frames as additional cache space. In one embodiment, the augmentation frames are obtained by mapping the blocks that map to a thrashed set to one or more additional, less utilized sets. The disclosed cache thrashing reduction system initially identifies a set that is likely to be experiencing thrashing, referred to herein as a thrashed set. Once thrashing is detected, the cache thrashing reduction system selects one or more additional sets to augment a thrashed set, referred to herein as the augmentation sets. In this manner, blocks of main memory that are mapped to a thrashed set are now mapped to an expanded group of sets (the thrashed set and the augmentation sets). Finally, when the augmentation sets are no longer likely to be needed to decrease thrashing, the augmentation set(s) are disassociated from the thrashed set(s).

    摘要翻译: 公开了一种用于自适应地减少高速缓存存储器设备中的高速缓存颠簸的方法和装置。 通过自动检测一组的抖动,然后提供一个或多个增强帧作为附加高速缓存空间来提高缓存性能。 在一个实施例中,通过将映射到捶打集合的块映射到一个或多个附加的较少使用的集合来获得增强帧。 所公开的高速缓存抖动减少系统最初识别出可能经历抖动的集合,这里被称为捶击集合。 一旦检测到抖动,高速缓存颠簸降低系统选择一个或多个附加集合来增加捶打集合,这里称为扩充集合。 以这种方式,映射到thrashed集合的主存储块现在映射到扩展的集合集(thrashed集合和扩充集合)。 最后,当不再可能需要增加集合来减少颠簸时,增加集合与捶打集合分离。

    Digital microprocessor device having dnamically selectable instruction
execution intervals
    4.
    发明授权
    Digital microprocessor device having dnamically selectable instruction execution intervals 失效
    数字微处理器设备具有动态可选择的指令执行间隔

    公开(公告)号:US5802360A

    公开(公告)日:1998-09-01

    申请号:US640590

    申请日:1996-05-01

    摘要: A scheme for variable-delay instructions in a digital processor that allows for variable delay of some instructions to increase performance at different frequencies. The variable-delay (VD) feature allows flag-modifying instructions to execute in a differing number (1 or 2) of clock cycles, depending on the application. In applications that clock the processor at less than maximum frequency, instructions that modify the flag are executed in one clock cycle. In applications that clock the processor at its maximum frequency, the instructions that modify the flag are executed in two clock cycles. If the critical path, and consequently the maximum frequency, of a processor is determined by a flag-modifying operation immediately followed by a flag-reading operation, then the VD scheme helps increase performance at either frequency. The performance increase is proportional to the difference in delays between the critical path associated with flag-modifying and other critical paths. At the lower frequency, a given application consumes slightly less energy and the cost of implementing the scheme is minimal.

    摘要翻译: 一种用于数字处理器中的可变延迟指令的方案,其允许一些指令的可变延迟以增加在不同频率下的性能。 可变延迟(VD)功能允许标志修改指令在不同数量(1或2)个时钟周期内执行,具体取决于应用。 在处理器的时钟频率小于最大频率的应用中,修改标志的指令在一个时钟周期内执行。 在处理器处于最高频率时钟的应用中,修改标志的指令在两个时钟周期内执行。 如果处理器的关键路径以及因此的最大频率由紧随其后的标志读取操作的标志修改操作确定,则VD方案有助于增加任一频率的性能。 性能提高与与标志修改和其他关键路径相关的关键路径之间的延迟差异成正比。 在较低的频率下,给定的应用消耗的能量稍微减少,并且实现该方案的成本是最小的。

    Method and apparatus for adaptive cache frame locking and unlocking
    5.
    发明授权
    Method and apparatus for adaptive cache frame locking and unlocking 失效
    自适应高速缓存帧锁定​​和解锁的方法和装置

    公开(公告)号:US08261022B2

    公开(公告)日:2012-09-04

    申请号:US09975764

    申请日:2001-10-09

    IPC分类号: G06F12/00

    摘要: A method and apparatus are disclosed for locking the most recently accessed frames in a cache memory. The most recently accessed frames in a cache memory are likely to be accessed by a task again in the near future. The most recently used frames may be locked at the beginning of a task switch or interrupt to improve the performance of the cache. The list of most recently used frames is updated as a task executes and may be embodied, for example, as a list of frames addresses or a flag associated with each frame. The list of most recently used frames may be separately maintained for each task if multiple tasks may interrupt each other. An adaptive frame unlocking mechanism is also disclosed that automatically unlocks frames that may cause a significant performance degradation for a task. The adaptive frame unlocking mechanism monitors a number of times a task experiences a frame miss and unlocks a given frame if the number of frame misses exceeds a predefined threshold.

    摘要翻译: 公开了用于将最近访问的帧锁定在高速缓冲存储器中的方法和装置。 高速缓冲存储器中最近访问的帧可能在不久的将来再次被任务访问。 最近使用的帧可能在任务切换或中断的开始时被锁定,以提高缓存的性能。 最近使用的帧的列表随着任务的执行而被更新,并且可以被实现为例如帧地址的列表或与每个帧相关联的标志。 如果多个任务可能相互中断,则可以为每个任务单独维护最近使用的帧的列表。 还公开了自适应帧解锁机制,其自动解锁可能导致任务的显着性能下降的帧。 自适应帧解锁机制监视任务经历帧丢失的次数,并且如果帧丢失次数超过预定阈值则解锁给定帧。

    Digital microprocessor device having variable-delay division hardware
    6.
    发明授权
    Digital microprocessor device having variable-delay division hardware 失效
    具有可变延迟分频硬件的数字微处理器设备

    公开(公告)号:US5805489A

    公开(公告)日:1998-09-08

    申请号:US646178

    申请日:1996-05-07

    摘要: The present invention is a variable-delay division (VDD) scheme implementable in hardware to execute signed and unsigned integer division and remainder operations in digital processor. The VDD scheme advantageously uses hardware utilized for multiplication to implement a 2-bits/cycle alignment step to iteratively align the divisor with the dividend. This speeds up the alignment phase of integer division. Quotient bits are produced at the rate of 1-bit/cycle using the well-known restoring scheme. For 32-bit 2's complement operands, the scheme has a delay less than a fixed-delay scheme for most operands.

    摘要翻译: 本发明是一种在硬件中实现的可变延迟分频(VDD)方案,用于在数字处理器中执行有符号和无符号整数除法和余数运算。 VDD方案有利地使用用于乘法的硬件来实现2比特/周期对准步骤,以将除数与被除数重复对齐。 这加快了整数除法的对齐阶段。 使用众所周知的恢复方案以1位/周期的速率产生商数位。 对于32位2的补码操作数,该方案的延迟小于大多数操作数的固定延迟方案。

    METHOD AND APPARATUS FOR ADAPTIVE CACHE FRAME LOCKING AND UNLOCKING
    7.
    发明申请
    METHOD AND APPARATUS FOR ADAPTIVE CACHE FRAME LOCKING AND UNLOCKING 失效
    用于自适应缓存框架锁定和解锁的方法和装置

    公开(公告)号:US20130024620A1

    公开(公告)日:2013-01-24

    申请号:US13559858

    申请日:2012-07-27

    IPC分类号: G06F12/08

    摘要: Most recently accessed frames are locked in a cache memory. The most recently accessed frames are likely to be accessed by a task again in the near future and may be locked at the beginning of a task switch or interrupt to improve cache performance. The list of most recently used frames is updated as a task executes and may be embodied as a list of frame addresses or a flag associated with each frame. The list of most recently used frames may be separately maintained for each task if multiple tasks may interrupt each other. An adaptive frame unlocking mechanism is also disclosed that automatically unlocks frames that may cause a significant performance degradation for a task. The adaptive frame unlocking mechanism monitors a number of times a task experiences a frame miss and unlocks a given frame if the number of frame misses exceeds a predefined threshold.

    摘要翻译: 最近访问的帧被锁定在高速缓冲存储器中。 最近访问的帧可能在不久的将来再次被任务访问,并且可能在任务切换或中断的开始时被锁定以提高高速缓存性能。 最近使用的帧的列表随着任务的执行而被更新,并且可以体现为帧地址的列表或与每个帧相关联的标志。 如果多个任务可能相互中断,则可以为每个任务单独维护最近使用的帧的列表。 还公开了自适应帧解锁机制,其自动解锁可能导致任务的显着性能下降的帧。 自适应帧解锁机制监视任务经历帧丢失的次数,并且如果帧丢失次数超过预定阈值则解锁给定帧。

    Method and apparatus for distributing multi-source/multi-sink control signals among nodes on a chip
    8.
    发明授权
    Method and apparatus for distributing multi-source/multi-sink control signals among nodes on a chip 失效
    用于在芯片上的节点之间分配多源/多宿控制信号的方法和装置

    公开(公告)号:US06754748B2

    公开(公告)日:2004-06-22

    申请号:US09785602

    申请日:2001-02-16

    IPC分类号: G06F104

    CPC分类号: G06F13/4217

    摘要: A method and apparatus are described for distributing multi-source/multi-sink control signals among nodes on a chip. Each node on the chip assists in returning the control signal to an inactive state at the start of each cycle. Thus, since all nodes contribute to returning the control signal to the inactive state, the control signal returns to the inactive state more quickly, near the start of a given cycle, and the remainder of the cycle remains available for a given node to drive the control signal. Each node includes an exemplary pulsed reset block that discharges the control signal network closest to it for a short interval, and over time the rest of the network, returning the network to an inactive state. Once the control signal network has been returned to an inactive state, the control signal may then be driven by a node during the remainder of the cycle.

    摘要翻译: 描述了用于在芯片上的节点之间分配多源/多宿控制信号的方法和装置。 芯片上的每个节点有助于在每个周期开始时将控制信号返回到非活动状态。 因此,由于所有节点有助于将控制信号返回到非活动状态,所以控制信号在给定周期的开始附近更快地返回到非活动状态,并且周期的剩余部分对于给定的节点保持可用于驱动 控制信号。 每个节点包括一个示例性的脉冲复位块,其以最短的时间间隔对最靠近它的控制信号网络进行放电,并且随着时间的推移,网络的其余部分将网络返回到非活动状态。 一旦控制信号网络已经返回到非活动状态,则控制信号然后可以在循环的剩余时间期间被节点驱动。

    Transverse correlator structure for a rake receiver
    9.
    发明授权
    Transverse correlator structure for a rake receiver 有权
    耙式接收机的横向相关器结构

    公开(公告)号:US06434163B1

    公开(公告)日:2002-08-13

    申请号:US09169674

    申请日:1998-10-10

    IPC分类号: H04B7216

    CPC分类号: H04B1/709 G06F17/15

    摘要: A RAKE receiver for use in a CDMA system is implemented as a transverse correlator in the complex domain. The transverse topology results in the correlator comprising a plurality of serial stages, each stage formed as a canonical unit of a multiplier, adder and memory. When implemented in the complex domain, the multiplier is replaced by multiplexers and the hardware may be significantly reduced by multiplexing between the I and Q components.

    摘要翻译: 在CDMA系统中使用的RAKE接收机被实现为复杂域中的横向相关器。 横向拓扑导致包括多个串行级的相关器,每个级形成为乘法器,加法器和存储器的规范单元。 当在复杂域中实现时,乘法器被多路复用器代替,并且可以通过I和Q组件之间的复用来显着地减少硬件。

    Method and apparatus for uniform and efficient handling of multiple
precise events in a processor by including event commands in the
instruction set
    10.
    发明授权
    Method and apparatus for uniform and efficient handling of multiple precise events in a processor by including event commands in the instruction set 失效
    用于通过在指令集中包括事件命令来统一且有效地处理处理器中的多个精确事件的方法和装置

    公开(公告)号:US5761492A

    公开(公告)日:1998-06-02

    申请号:US646157

    申请日:1996-05-07

    IPC分类号: G06F9/32 G06F9/38 G06F9/46

    CPC分类号: G06F9/3861 G06F9/32

    摘要: An integrated circuit having a digital processor, a decode stage for decoding an instruction from the instruction set, an execute stage coupled to the decode stage for executing the instruction, and event logic coupled to the decode stage operable to provide an event commands to the decode stage to override the instruction. In one embodiment, an integrated circuit having a pipelined processor handles multiple precise events through the decode stage and execute stage through a process which includes the steps of detecting a plurality of events and issuing an event command, selecting a highest priority event from said the of events, providing an event vector and a link address for the highest priority event, and allowing the event vector and the link to be modified for a higher priority event until the event command is issued to the execute stage.

    摘要翻译: 具有数字处理器的解码级,用于解码来自指令集的指令的解码级,耦合到用于执行指令的解码级的执行级,以及耦合到解码级的事件逻辑,可操作以向解码提供事件命令 舞台覆盖指令。 在一个实施例中,具有流水线处理器的集成电路通过解码级处理多个精确事件,并通过一个过程执行阶段,该过程包括以下步骤:检测多个事件并发出事件命令,从所述 事件,为最高优先级事件提供事件向量和链接地址,并且允许事件向量和链接被修改用于更高优先级的事件,直到事件命令被发布到执行阶段为止。