Dynamically partitioning processing across a plurality of heterogeneous processors
    11.
    发明授权
    Dynamically partitioning processing across a plurality of heterogeneous processors 失效
    跨多个异构处理器的动态分区处理

    公开(公告)号:US08091078B2

    公开(公告)日:2012-01-03

    申请号:US12116628

    申请日:2008-05-07

    IPC分类号: G06F9/45

    摘要: A program is into at least two object files: one object file for each of the supported processor environments. During compilation, code characteristics, such as data locality, computational intensity, and data parallelism, are analyzed and recorded in the object file. During run time, the code characteristics are combined with runtime considerations, such as the current load on the processors and the size of the data being processed, to arrive at an overall value. The overall value is then used to determine which of the processors will be assigned the task. The values are assigned based on the characteristics of the various processors. For example, if one processor is better at handling intensive computations against large streams of data, programs that are highly computationally intensive and process large quantities of data are weighted in favor of that processor. The corresponding object is then loaded and executed on the assigned processor.

    摘要翻译: 一个程序进入至少两个对象文件:一个对象文件,用于每个受支持的处理器环境。 在编译过程中,将数据位置,计算强度和数据并行等代码特征分析并记录在目标文件中。 在运行时间期间,代码特征与运行时考虑相结合,例如处理器上的当前负载和正在处理的数据的大小,以达到总体值。 然后,总体值用于确定哪些处理器将被分配任务。 这些值基于各种处理器的特性分配。 例如,如果一个处理器更好地处理针对大量数据流的密集计算,则高度计算密集的程序和处理大量数据的程序对该处理器进行加权。 然后在分配的处理器上加载和执行相应的对象。

    Hiding Memory Latency
    12.
    发明申请
    Hiding Memory Latency 失效
    隐藏内存延迟

    公开(公告)号:US20080162906A1

    公开(公告)日:2008-07-03

    申请号:US12049293

    申请日:2008-03-15

    IPC分类号: G06F9/30

    CPC分类号: G06F9/322 G06F8/41 G06F9/3851

    摘要: An approach to hiding memory latency in a multi-thread environment is presented. Branch Indirect and Set Link (BISL) and/or Branch Indirect and Set Link if External Data (BISLED) instructions are placed in thread code during compilation at instances that correspond to a prolonged instruction. A prolonged instruction is an instruction that instigates latency in a computer system, such as a DMA instruction. When a first thread encounters a BISL or a BISLED instruction, the first thread passes control to a second thread while the first thread's prolonged instruction executes. In turn, the computer system masks the latency of the first thread's prolonged instruction. The system can be optimized based on the memory latency by creating more threads and further dividing a register pool amongst the threads to further hide memory latency in operations that are highly memory bound.

    摘要翻译: 介绍了一种在多线程环境中隐藏内存延迟的方法。 分支间接和设置链接(BISL)和/或分支间接和设置链接,如果外部数据(BISLED)指令在对应于延长的指令的实例的编译期间被放置在线程代码中。 延长的指令是指示计算机系统中的延迟,例如DMA指令。 当第一个线程遇到BISL或BISLED指令时,第一个线程在第一个线程的延长指令执行时将控制传递给第二个线程。 反过来,计算机系统掩盖了第一个线程延长的指令的延迟。 可以通过创建更多线程并在线程之间进一步划分寄存器池来进一步隐藏高度内存限制的操作中的内存延迟,从而基于内存延迟来优化系统。

    Hiding memory latency
    13.
    发明授权
    Hiding memory latency 失效
    隐藏内存延迟

    公开(公告)号:US07620951B2

    公开(公告)日:2009-11-17

    申请号:US12049293

    申请日:2008-03-15

    IPC分类号: G06F9/46

    CPC分类号: G06F9/322 G06F8/41 G06F9/3851

    摘要: An approach to hiding memory latency in a multi-thread environment is presented. Branch Indirect and Set Link (BISL) and/or Branch Indirect and Set Link if External Data (BISLED) instructions are placed in thread code during compilation at instances that correspond to a prolonged instruction. A prolonged instruction is an instruction that instigates latency in a computer system, such as a DMA instruction. When a first thread encounters a BISL or a BISLED instruction, the first thread passes control to a second thread while the first thread's prolonged instruction executes. In turn, the computer system masks the latency of the first thread's prolonged instruction. The system can be optimized based on the memory latency by creating more threads and further dividing a register pool amongst the threads to further hide memory latency in operations that are highly memory bound.

    摘要翻译: 介绍了一种在多线程环境中隐藏内存延迟的方法。 分支间接和设置链接(BISL)和/或分支间接和设置链接,如果外部数据(BISLED)指令在对应于延长的指令的实例的编译期间被放置在线程代码中。 延长的指令是指示计算机系统中的延迟,例如DMA指令。 当第一个线程遇到BISL或BISLED指令时,第一个线程在第一个线程的延长指令执行时将控制传递给第二个线程。 反过来,计算机系统掩盖了第一个线程延长的指令的延迟。 可以通过创建更多线程并在线程之间进一步划分寄存器池来进一步隐藏高度内存限制的操作中的内存延迟,从而可以基于内存延迟来优化系统。

    Virtual Devices Using a Plurality of Processors
    14.
    发明申请
    Virtual Devices Using a Plurality of Processors 失效
    使用多个处理器的虚拟设备

    公开(公告)号:US20080168443A1

    公开(公告)日:2008-07-10

    申请号:US12049179

    申请日:2008-03-14

    IPC分类号: G06F15/76 G06F9/46 G06F9/30

    CPC分类号: G06F9/4843 G06F9/544

    摘要: An approach is provided to allow virtual devices that use a plurality of processors in a multiprocessor systems, such as the BE environment. Using this method, a synergistic processing unit (SPU) can either be dedicated to performing a particular function (i.e., audio, video, etc.) or a single SPU can be programmed to perform several functions on behalf of the other processors in the system. The application, preferably running in one of the primary (PU) processors, issues IOCTL commands through device drivers that correspond to SPUs. The kernel managing the primary processors responds by sending an appropriate message to the SPU that is performing the dedicated function. Using this method, an SPU can be virtualized for swapping multiple tasks or dedicated to performing a particular task.

    摘要翻译: 提供了一种方法来允许在诸如BE环境的多处理器系统中使用多个处理器的虚拟设备。 使用这种方法,协同处理单元(SPU)可以专用于执行特定功能(即,音频,视频等),或者单个SPU可被编程为代表系统中的其他处理器执行若干功能 。 优选地,在主(PU)处理器之一中运行的应用通过对应于SPU的设备驱动器发出IOCTL命令。 管理主处理器的内核通过向执行专用功能的SPU发送适当的消息来做出响应。 使用此方法,可以将SPU虚拟化用于交换多个任务或专用于执行特定任务。

    Virtual devices using a plurality of processors
    15.
    发明授权
    Virtual devices using a plurality of processors 失效
    使用多个处理器的虚拟设备

    公开(公告)号:US08549521B2

    公开(公告)日:2013-10-01

    申请号:US12049179

    申请日:2008-03-14

    IPC分类号: G06F9/46 G06F13/00 G06F15/173

    CPC分类号: G06F9/4843 G06F9/544

    摘要: An approach is provided to allow virtual devices that use a plurality of processors in a multiprocessor systems, such as the BE environment. Using this method, a synergistic processing unit (SPU) can either be dedicated to performing a particular function (i.e., audio, video, etc.) or a single SPU can be programmed to perform several functions on behalf of the other processors in the system. The application, preferably running in one of the primary (PU) processors, issues IOCTL commands through device drivers that correspond to SPUs. The kernel managing the primary processors responds by sending an appropriate message to the SPU that is performing the dedicated function. Using this method, an SPU can be virtualized for swapping multiple tasks or dedicated to performing a particular task.

    摘要翻译: 提供了一种方法来允许在诸如BE环境的多处理器系统中使用多个处理器的虚拟设备。 使用这种方法,协同处理单元(SPU)可以专用于执行特定功能(即,音频,视频等),或者单个SPU可被编程为代表系统中的其他处理器执行若干功能 。 优选地,在主(PU)处理器之一中运行的应用通过对应于SPU的设备驱动器发出IOCTL命令。 管理主处理器的内核通过向执行专用功能的SPU发送适当的消息来做出响应。 使用此方法,可以将SPU虚拟化用于交换多个任务或专用于执行特定任务。

    Partitioning processor resources based on memory usage
    16.
    发明授权
    Partitioning processor resources based on memory usage 失效
    基于内存使用分配处理器资源

    公开(公告)号:US08032871B2

    公开(公告)日:2011-10-04

    申请号:US12365413

    申请日:2009-02-04

    IPC分类号: G06F9/45

    摘要: Processor resources are partitioned based on memory usage. A compiler determines the extent to which a process is memory-bound and accordingly divides the process into a number of threads. When a first thread encounters a prolonged instruction, the compiler inserts a conditional branch to a second thread. When the second thread encounters a prolonged instruction, a conditional branch to a third thread is executed. This continues until the last thread conditionally branches back to the first thread. An indirect segmented register file is used so that the “return to” and “branch to” logical registers within each thread are the same (e.g., R1 and R2) for each thread. These logical registers are mapped to hardware registers that store actual addresses. The indirect mapping is altered to bypass completed threads. When the last thread completes it may signal an external process.

    摘要翻译: 处理器资源根据内存使用情况进行分区。 编译器确定进程是内存限制的程度,从而将进程划分为多个线程。 当第一个线程遇到延长的指令时,编译器将条件分支插入第二个线程。 当第二个线程遇到延长的指令时,执行到第三个线程的条件分支。 这将持续到最后一个线程有条件地分支回到第一个线程。 使用间接分段寄存器文件,使得每个线程内的“返回”和“分支到”逻辑寄存器对于每个线程是相同的(例如,R1和R2)。 这些逻辑寄存器映射到存储实际地址的硬件寄存器。 间接映射被更改为绕过完成的线程。 当最后一个线程完成时,它可能会发出外部进程信号。

    Virtual devices using a pluarlity of processors
    17.
    发明授权
    Virtual devices using a pluarlity of processors 失效
    使用多种处理器的虚拟设备

    公开(公告)号:US07496917B2

    公开(公告)日:2009-02-24

    申请号:US10670835

    申请日:2003-09-25

    CPC分类号: G06F9/4843 G06F9/544

    摘要: A method is provided to allow virtual devices that use a plurality of processors in a multiprocessor systems, such as the BE environment. Using this method, a synergistic processing unit (SPU) can either be dedicated to performing a particular function (i.e., audio, video, etc.) or a single SPU can be programmed to perform several functions on behalf of the other processors in the system. The application, preferably running in one of the primary (PU) processors, issues IOCTL commands through device drivers that correspond to SPUs. The kernel managing the primary processors responds by sending an appropriate message to the SPU that is performing the dedicated function. Using this method, an SPU can be virtualized for swapping multiple tasks or dedicated to performing a particular task.

    摘要翻译: 提供了一种方法来允许在诸如BE环境的多处理器系统中使用多个处理器的虚拟设备。 使用这种方法,协同处理单元(SPU)可以专用于执行特定功能(即,音频,视频等),或者单个SPU可被编程为代表系统中的其他处理器执行若干功能 。 优选地,在主(PU)处理器之一中运行的应用通过对应于SPU的设备驱动器发出IOCTL命令。 管理主处理器的内核通过向执行专用功能的SPU发送适当的消息来做出响应。 使用此方法,可以将SPU虚拟化用于交换多个任务或专用于执行特定任务。

    System and method for efficient implementation of software-managed cache
    18.
    发明授权
    System and method for efficient implementation of software-managed cache 失效
    用于高效实施软件管理缓存的系统和方法

    公开(公告)号:US07752350B2

    公开(公告)日:2010-07-06

    申请号:US11678141

    申请日:2007-02-23

    IPC分类号: G06F13/28 G06F12/00 G06F9/34

    CPC分类号: G06F12/0802 G06F9/3802

    摘要: A system and method for an efficient implementation of a software-managed cache is presented. When an application thread executes on a simple processor, the application thread uses a conditional data select instruction for eliminating a conditional branch instruction when accessing a software-managed cache. An application thread issues a conditional data select instruction (DMA transfer) after a cache directory lookup, wherein the size of the requested data is dependent upon the outcome of the cache directory lookup. When the cache directory lookup results in a cache hit, the application thread requests a transfer of zero bits of data, which results in a DMA controller (DMAC) performing a no-op instruction. When the cache directory lookup results in a cache miss, the application thread requests a data block transfer the size of a corresponding cache line.

    摘要翻译: 介绍了一种高效实现软件管理缓存的系统和方法。 当应用程序线程在简单处理器上执行时,应用程序线程使用条件数据选择指令,用于在访问软件管理的高速缓存时消除条件分支指令。 应用程序线程在缓存目录查找之后发出条件数据选择指令(DMA传输),其中所请求数据的大小取决于高速缓存目录查找的结果。 当缓存目录查找导致缓存命中时,应用程序线程请求传输零位数据,这导致执行无操作指令的DMA控制器(DMAC)。 当缓存目录查找导致高速缓存未命中时,应用程序线程请求数据块传输相应高速缓存行的大小。

    Partitioning processor resources based on memory usage
    19.
    发明授权
    Partitioning processor resources based on memory usage 失效
    基于内存使用分配处理器资源

    公开(公告)号:US07506325B2

    公开(公告)日:2009-03-17

    申请号:US11050020

    申请日:2005-02-03

    IPC分类号: G06F9/45

    摘要: Processor resources are partitioned based on memory usage. A compiler determines the extent to which a process is memory-bound and accordingly divides the process into a number of threads. When a first thread encounters a prolonged instruction, the compiler inserts a conditional branch to a second thread. When the second thread encounters a prolonged instruction, a conditional branch to a third thread is executed. This continues until the last thread conditionally branches back to the first thread. An indirect segmented register file is used so that the “return to” and “branch to” logical registers within each thread are the same (e.g., R1 and R2)for each thread. These logical registers are mapped to hardware registers that store actual addresses. The indirect mapping is altered to bypass completed threads. When the last thread completes it may signal an external process.

    摘要翻译: 处理器资源根据内存使用情况进行分区。 编译器确定进程是内存限制的程度,从而将进程划分为多个线程。 当第一个线程遇到延长的指令时,编译器将条件分支插入第二个线程。 当第二个线程遇到延长的指令时,执行到第三个线程的条件分支。 这将持续到最后一个线程有条件地分支回到第一个线程。 使用间接分段寄存器文件,使得每个线程内的“返回”和“分支到”逻辑寄存器对于每个线程是相同的(例如,R1和R2)。 这些逻辑寄存器映射到存储实际地址的硬件寄存器。 间接映射被更改为绕过完成的线程。 当最后一个线程完成时,它可能会发出外部进程信号。

    Task queue management of virtual devices using a plurality of processors
    20.
    发明授权
    Task queue management of virtual devices using a plurality of processors 失效
    使用多个处理器的虚拟设备的任务队列管理

    公开(公告)号:US07478390B2

    公开(公告)日:2009-01-13

    申请号:US10670838

    申请日:2003-09-25

    IPC分类号: G06F9/46 G06F13/00 G06F13/24

    CPC分类号: G06F9/505

    摘要: A task queue manager manages the task queues corresponding to virtual devices. When a virtual device function is requested, the task queue manager determines whether an SPU is currently assigned to the virtual device task. If an SPU is already assigned, the request is queued in a task queue being read by the SPU. If an SPU has not been assigned, the task queue manager assigns one of the SPUs to the task queue. The queue manager assigns the task based upon which SPU is least busy as well as whether one of the SPUs recently performed the virtual device function. If an SPU recently performed the virtual device function, it is more likely that the code used to perform the function is still in the SPU's local memory and will not have to be retrieved from shared common memory using DMA operations.

    摘要翻译: 任务队列管理器管理与虚拟设备相对应的任务队列。 当请求虚拟设备功能时,任务队列管理器确定SPU当前是否被分配给虚拟设备任务。 如果已经分配了SPU,则该请求在SPU所读取的任务队列中排队。 如果尚未分配SPU,则任务队列管理器将其中一个SPU分配给任务队列。 队列管理器根据哪个SPU最不忙,以及一个SPU最近是否执行了虚拟设备功能来分配任务。 如果SPU最近执行了虚拟设备功能,则用于执行该功能的代码更有可能仍在SPU的本地存储器中,并且不需要使用DMA操作从共享的公共存储器中检索。