Efficient context saving and restoring in a multi-tasking computing
system environment
    1.
    发明授权
    Efficient context saving and restoring in a multi-tasking computing system environment 失效
    在多任务计算系统环境中高效的上下文保存和恢复

    公开(公告)号:US06061711A

    公开(公告)日:2000-05-09

    申请号:US699280

    申请日:1996-08-19

    摘要: In a multi-tasking computing system environment, one program is halted and context switched out so that a processor may context switch in a subsequent program for execution. Processor state information exists which reflects the state of the program being context switched out. Storage of this processor state information permits successful resumption of the context switched out program. When the context switched out program is subsequently context switched in, the stored processor information is loaded in preparation for successfully resuming the program at the point in which execution was previously halted. Although, large areas of memory can be allocated to processor state information storage, only a portion of this may need to be preserved across a context switch for successfully saving and resuming the context switched out program. Unnecessarily saving and loading all available processor state information can be noticeably inefficient particularly where relatively large amounts of processor state information exists. In one embodiment, a processor requests a co-processor to context switch out the currently executing program. At a predetermined appropriate point in the executing program, the co-processor responds by halting program execution and saving only the minimal amount of processor state information necessary for successful restoration of the program. The appropriate point is chosen by the application programmer at a location in the executing program that requires preserving a minimal portion of the processor information across a context switch. By saving only a minimal amount of processor information, processor time savings are accumulated across context save and restoration operations.

    摘要翻译: 在多任务计算系统环境中,停止一个程序并上下文切换,使得处理器可以在后续程序中上下文切换以执行。 存在反映正在上下文切换的程序的状态的处理器状态信息。 该处理器状态信息的存储允许成功恢复上下文切换程序。 当上下文切换程序随后进行上下文切换时,加载所存储的处理器信息以准备好在先前停止执行的点成功恢复程序。 尽管可以将大面积的存储器分配给处理器状态信息存储,但是只有一部分可能需要在上下文切换中被保留以成功地保存和恢复上下文切换程序。 不必要地保存和加载所有可用的处理器状态信息,特别是在存在相对大量的处理器状态信息的情况下是显着的。 在一个实施例中,处理器请求协处理器上下文切换当前执行的程序。 在执行程序中的预定的适当点处,协处理器通过停止程序执行并且仅节省成功恢复程序所需的最小量的处理器状态信息来进行响应。 应用程序员在执行程序中需要在上下文切换中保留处理器信息的最小部分的位置来选择适当的点。 通过仅节省最少量的处理器信息,可以在上下文保存和恢复操作中累积处理器时间节省。

    Resizable and relocatable memory scratch pad as a cache slice
    3.
    发明授权
    Resizable and relocatable memory scratch pad as a cache slice 失效
    可调整大小和可重定位的内存便笺作为缓存片

    公开(公告)号:US5966734A

    公开(公告)日:1999-10-12

    申请号:US733818

    申请日:1996-10-18

    IPC分类号: G06F12/00 G06F12/08 G06F12/12

    摘要: A cache system supports a re-sizable software-managed fast scratch pad that is implemented as a cache-slice. A processor register indicates the size and base address of the scratch pad. Instructions which facilitate use of the scratch pad include a prefetch instruction which loads multiple lines of data from external memory into the scratch pad and a writeback instruction which writes multiple lines of data from the scratch pad to external memory. The prefetch and writeback instructions are non-blocking instructions to allow instructions following in the program order to be executed while a prefetch or writeback operation is pending.

    摘要翻译: 高速缓存系统支持重新定义的软件管理快速暂存板,实现为缓存片。 处理器寄存器指示便笺板的大小和基址。 便于使用便笺板的指令包括预取指令,其将来自外部存储器的多行数据加载到便笺本中,以及将多行数据从便笺本写入外部存储器的写回指令。 预取和回写指令是非阻塞指令,允许在预取或回写操作挂起时执行程序顺序中的指令。

    Coordination and synchronization of an asymmetric, single-chip, dual
multiprocessor
    4.
    发明授权
    Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor 失效
    不对称,单芯片双重多处理器的协调和同步

    公开(公告)号:US5978838A

    公开(公告)日:1999-11-02

    申请号:US703434

    申请日:1996-08-26

    摘要: An integrated multiprocessor architecture simplifies synchronization of multiple processing units. The multiple processing units constitute a general-purpose or control processor and a vector processor which has a single-instruction-multiple-data (SIMD) architecture so that multiple parallel processing units in the vector processor all complete an instruction simultaneously and do not require software synchronization. The control control processor controls the vector processor and creates a fork in a program flow by starting the vector processor. An instruction set for the control processor includes special instructions that enable the control processor to access registers of the vector processor, start or halt execution by the vector processor, and test flags written by the vector processor to indicate completion of tasks. The two processors then execute separate program threads in parallel until the control processor stops the vector processor, an exception is encountered, or the vector processor completes its program thread and enters an idle state. An instruction set for the vector processor includes special instructions that interrupt the first processor to indicate a task is complete. A register coupled to and accessible by both processors stores a state bit indicating whether the vector processor is running or idle. The control processor can synchronize the separate program threads by executing a loop which polls the state bit. When the state bit indicates the vector processor is idle, the general-purpose processor can process results from the vector processor and restart the vector processor.

    摘要翻译: 集成多处理器架构简化了多个处理单元的同步。 多个处理单元构成具有单指令多数据(SIMD)架构的通用或控制处理器和向量处理器,使得向量处理器中的多个并行处理单元同时完成指令并且不需要软件 同步 控制控制处理器控制向量处理器并通过启动向量处理器在程序流中创建一个分支。 用于控制处理器的指令集包括使得控制处理器能够访问向量处理器的寄存器,启动或停止由向量处理器执行的特殊指令,以及由向量处理器写入的指示完成任务的测试标志。 然后,两个处理器并行执行单独的程序线程,直到控制处理器停止向量处理器,遇到异常,或者向量处理器完成其程序线程并进入空闲状态。 用于向量处理器的指令集包括中断第一处理器以指示任务完成的特殊指令。 耦合到两个处理器并由两个处理器访问的寄存器存储指示矢量处理器是正在运行还是空闲的状态位。 控制处理器可以通过执行轮询状态位的循环来同步单独的程序线程。 当状态位指示向量处理器空闲时,通用处理器可以处理来自向量处理器的结果并重新启动向量处理器。

    Digital signal processor configuration including multiplying units coupled to plural accumlators for enhanced parallel mac processing
    5.
    发明授权
    Digital signal processor configuration including multiplying units coupled to plural accumlators for enhanced parallel mac processing 有权
    数字信号处理器配置包括耦合到多个累加器的乘法单元以用于增强的并行MAC处理

    公开(公告)号:US06230180B1

    公开(公告)日:2001-05-08

    申请号:US09172527

    申请日:1998-10-14

    申请人: Moataz A. Mohamed

    发明人: Moataz A. Mohamed

    IPC分类号: G06F9302

    摘要: The present invention generally relates to multiply-accumulate units for use in digital signal processors. Each multiply-accumulate unit includes a multiply unit which is coupled with two or more dedicated accumulators. Because of the coupling configuration, when an instruction specifies which accumulator should be used in executing an operation, the instruction need not specify which multiply unit should be utilized. A scheduler containing a digital signal processor's coupling configuration may then identify the multiply unit associated with the accumulator and may then forward the instruction to the identified multiply unit. Multiply-accumulate units can be configured to execute both scalar and vector operations. For executing vector operations, multiply units and their coupled accumulators are configured such that each may be easily grouped with other multiply units and accumulators.

    摘要翻译: 本发明一般涉及用于数字信号处理器的多重累积单元。 每个乘法累加单元包括与两个或多个专用累加器耦合的乘法单元。 由于耦合配置,当指令指定在执行操作时应使用哪个累加器时,指令不需要指定应使用哪个乘法单元。 包含数字信号处理器的耦合配置的调度器然后可以识别与累加器相关联的乘法单元,然后可以将指令转发到所识别的乘法单元。 可以将乘法累加单位配置为执行标量和向量操作。 对于执行向量操作,乘法单元及其耦合的累加器被配置为使得每个乘法器可以容易地与其他乘法单元和累加器分组。

    System and method for handling software interrupts with argument passing
    6.
    发明授权
    System and method for handling software interrupts with argument passing 失效
    用参数传递处理软件中断的系统和方法

    公开(公告)号:US5996058A

    公开(公告)日:1999-11-30

    申请号:US699295

    申请日:1996-08-19

    摘要: A multiprocessor architectural definition provides that a program executing on a first processor interrupts a second processor by executing a software interrupt instruction. The software interrupt instruction includes an argument field for passing information from a program requesting the software interrupt. The argument, along with the opcode, is saved in a register designated for holding the argument. The information communicated via the argument is used in one embodiment to indicate a cause of the interrupt. In an embodiment, the information communicated via the argument designates an interrupt service routine to be activated in the interrupted processor.

    摘要翻译: 多处理器架构定义规定,在第一处理器上执行的程序通过执行软件中断指令来中断第二处理器。 软件中断指令包括用于从请求软件中断的程序传递信息的参数域。 该参数连同操作码一起保存在指定用于保存参数的寄存器中。 在一个实施例中使用通过参数传送的信息来指示中断的原因。 在一个实施例中,经由参数传送的信息指定要在中断的处理器中被激活的中断服务程序。

    Reconfigurable functional units for implementing a hybrid VLIW-SIMD programming model
    7.
    发明授权
    Reconfigurable functional units for implementing a hybrid VLIW-SIMD programming model 有权
    用于实现混合VLIW-SIMD编程模型的可重构功能单元

    公开(公告)号:US06366998B1

    公开(公告)日:2002-04-02

    申请号:US09172315

    申请日:1998-10-14

    申请人: Moataz A. Mohamed

    发明人: Moataz A. Mohamed

    IPC分类号: G06F938

    摘要: The present invention generally relates to a hybrid VLIW-SIMD programming model for a digital signal processor. The hybrid programming model broadcasts a packet of information to a plurality of functional units or processing elements. Each packet contains several instructions having certain characteristics, such as instruction type and instruction length, among others. The hybrid programming model includes functional units which are reconfigurable based upon the instructions with an instruction packet and the availability of the functional units. The model groups the functional units such that the operations specified in the instructions can be efficiently executed and selects which functional units should be utilized for a given operation.

    摘要翻译: 本发明一般涉及用于数字信号处理器的混合VLIW-SIMD编程模型。 混合编程模型向多个功能单元或处理元件广播信息分组。 每个分组包含具有特定特征的若干指令,诸如指令类型和指令长度等。 混合编程模型包括基于具有指令分组的指令和功能单元的可用性而可重新配置的功能单元。 该模型对功能单元进行分组,使得可以有效地执行指令中指定的操作,并选择哪个功能单元应用于给定的操作。

    Processor containing data path units with forwarding paths between two data path units and a unique configuration or register blocks
    8.
    发明授权
    Processor containing data path units with forwarding paths between two data path units and a unique configuration or register blocks 有权
    包含数据路径单元的处理器,具有两个数据路径单元之间的转发路径和唯一的配置或寄存器块

    公开(公告)号:US06301653B1

    公开(公告)日:2001-10-09

    申请号:US09173257

    申请日:1998-10-14

    IPC分类号: G06F934

    摘要: The present invention provides an efficient method of forwarding and sharing information between functional units and register files in an effort to execute instructions. A digital signal processor includes a plurality of register blocks for storing data operands coupled to a plurality of data path units for executing instructions. Preferably, each register block is coupled to at least two data path units. In addition, the processor preferably has a plurality of forwarding paths which forward information from one data path unit to another. A scheduler efficiently forwards instructions to data path units based on information regarding the configuration of the processor and any restrictions which might be imposed on the scheduler.

    摘要翻译: 本发明提供了一种在功能单元和注册文件之间转发和共享信息以有效执行指令的方法。 数字信号处理器包括多个寄存器块,用于存储耦合到用于执行指令的多个数据路径单元的数据操作数。 优选地,每个寄存器块耦合到至少两个数据路径单元。 此外,处理器优选地具有将信息从一个数据路径单元转发到另一个的多个转发路径。 调度器基于关于处理器的配置的信息和可能施加在调度器上的任何限制来有效地将指令转发到数据路径单元。

    Apparatus and method for an improved performance VLIW processor
    9.
    发明授权
    Apparatus and method for an improved performance VLIW processor 有权
    用于改进性能的VLIW处理器的装置和方法

    公开(公告)号:US07127588B2

    公开(公告)日:2006-10-24

    申请号:US09730039

    申请日:2000-12-05

    IPC分类号: G06F15/82

    摘要: In one exemplary embodiment, the disclosed VLIW processor comprises a number of threads where each thread includes a processing unit. For example, there can be two threads, where each of the two threads has its own processing unit. According to this exemplary embodiment, a number of VLIW packets are divided into a number of issue groups. As an example, two VLIW packets are divided into two issue groups each. The first issue group in the first VLIW packet is provided to a first thread for execution in the first thread processing unit during a first clock cycle. Concurrently, the first issue group in the second VLIW packet is provided to a second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the first clock cycle. Moreover, the second issue group in the first VLIW packet is provided to the first thread for execution in the first thread processing unit during a second clock cycle. Concurrently, the second issue group in the second VLIW packet is provided to the second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the second clock cycle. In this manner, various resources of the VLIW processor are efficiently utilized and two VLIW packets are executed during two clock cycles. As such, the processing speed of the VLIW processor is doubled without a significant increase in the power consumed by the VLIW processor.

    摘要翻译: 在一个示例性实施例中,所公开的VLIW处理器包括多个线程,其中每个线程包括处理单元。 例如,可以有两个线程,其中两个线程中的每一个都有自己的处理单元。 根据该示例性实施例,将多个VLIW分组分成多个问题组。 例如,两个VLIW数据包分为两个问题组。 在第一时钟周期中,将第一VLIW分组中的第一个问题组提供给在第一线程处理单元中执行的第一线程。 同时,第二VLIW分组中的第一个问题组被提供给第二个线程,以在相同的时钟周期内,即在第一个时钟周期期间在第二线程处理单元中执行。 此外,在第二时钟周期期间,将第一VLIW分组中的第二组组提供给第一线程以在第一线程处理单元中执行。 同时,第二VLIW分组中的第二个问题组被提供给第二个线程,以在相同的时钟周期内,即在第二个时钟周期期间在第二个线程处理单元中执行。 以这种方式,VLIW处理器的各种资源被有效地利用,并且在两个时钟周期期间执行两个VLIW分组。 因此,VLIW处理器的处理速度加倍,而VLIW处理器消耗的功率不会明显增加。

    Method for reducing power when fetching instructions in a processor and related apparatus
    10.
    发明授权
    Method for reducing power when fetching instructions in a processor and related apparatus 有权
    在处理器和相关设备中取指令时降低功耗的方法

    公开(公告)号:US06820194B1

    公开(公告)日:2004-11-16

    申请号:US09829823

    申请日:2001-04-10

    IPC分类号: G06F932

    CPC分类号: G06F9/381 G06F9/3814

    摘要: In one disclosed embodiment an instruction loop having at least one instruction is identified. For example, each instruction can be a VLIW packet comprised of several individual instructions. The instructions of the instruction loop are fetched from a program memory. The instructions are then stored in a register queue. For example, the register queue can be implemented with a head pointer which is adjusted to select a register in which to write each instruction that is fetched. It is then determined whether the processor requires execution of the instruction loop, for example, by checking a program counter (PC) value corresponding to each instruction. When the processor requires execution of the instruction loop, the instructions are output from the register queue. For example, the register queue can be implemented with an access pointer which is adjusted to select a register from which to output each instruction that is required.

    摘要翻译: 在一个公开的实施例中,识别具有至少一条指令的指令循环。 例如,每个指令可以是由几个单独指令组成的VLIW分组。 从程序存储器中取出指令循环的指令。 然后将指令存储在寄存器队列中。 例如,寄存器队列可以用头指针来实现,该头指针被调整以选择一个寄存器,在该寄存器中写入每个被取出的指令。 然后,例如通过检查对应于每个指令的程序计数器(PC)值来确定处理器是否需要执行指令循环。 当处理器要求执行指令循环时,指令从寄存器队列输出。 例如,寄存器队列可以用访问指针来实现,该访问指针被调整以选择要从其输出所需要的每个指令的寄存器。