Apparatus and method for partitioning programs between a general purpose core and one or more accelerators
    1.
    发明申请
    Apparatus and method for partitioning programs between a general purpose core and one or more accelerators 有权
    用于在通用核心和一个或多个加速器之间划分程序的装置和方法

    公开(公告)号:US20070174828A1

    公开(公告)日:2007-07-26

    申请号:US11339592

    申请日:2006-01-25

    IPC分类号: G06F9/45

    CPC分类号: G06F8/45 G06F8/451 G06F8/456

    摘要: An apparatus and method for partitioning programs between a general purpose core and one or more accelerators are provided. With the apparatus and method, a compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.

    摘要翻译: 提供了用于在通用内核和一个或多个加速器之间分配程序的装置和方法。 利用该装置和方法,提供了一种编译器前端,用于将相应高级编程语言中的程序源代码转换为中间代码表示。 该中间代码表示被提供给过程间优化器,其确定程序的每个部分应执行哪个核心处理器或加速器,并且基于该组决定将程序分割成子程序。 过程间优化器可以进一步向分区添加指令以根据需要协调和同步子程序。 每个子程序被编译在用于执行子程序的特定核心处理器或加速器的指令集架构的适当编译器后端上。 编译的子程序然后链接从而生成可执行程序。

    Software managed cache optimization system and method for multi-processing systems
    2.
    发明申请
    Software managed cache optimization system and method for multi-processing systems 失效
    用于多处理系统的软件管理缓存优化系统和方法

    公开(公告)号:US20060123405A1

    公开(公告)日:2006-06-08

    申请号:US11002553

    申请日:2004-12-02

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: The present invention provides for a method for computer program code optimization for a software managed cache in either a uni-processor or a multi-processor system. A single source file comprising a plurality of array references is received. The plurality of array references is analyzed to identify predictable accesses. The plurality of array references is analyzed to identify secondary predictable accesses. One or more of the plurality of array references is aggregated based on identified predictable accesses and identified secondary predictable accesses to generate aggregated references. The single source file is restructured based on the aggregated references to generate restructured code. Prefetch code is inserted in the restructured code based on the aggregated references. Software cache update code is inserted in the restructured code based on the aggregated references. Explicit cache lookup code is inserted for the remaining unpredictable accesses. Calls to a miss handler for misses in the explicit cache lookup code are inserted. A miss handler is included in the generated code for the program. In the miss handler, a line to evict is chosen based on recent usage and predictability. In the miss handler, appropriate DMA commands are issued for the evicted line and the missing line.

    摘要翻译: 本发明提供了用于在单处理器或多处理器系统中的软件管理高速缓存的计算机程序代码优化的方法。 接收包括多个阵列引用的单个源文件。 分析多个阵列引用以识别可预测的访问。 分析多个阵列引用以识别次级可预测访问。 多个阵列引用中的一个或多个基于所识别的可预测访问和识别的次级可预测访问来聚合以生成聚合引用。 单个源文件根据聚合引用进行重组,以生成重组代码。 预取代码根据聚合引用插入重组的代码中。 软件高速缓存更新代码根据聚合引用插入重组的代码中。 为其余的不可预测的访问插入了显式缓存查找代码。 插入显式缓存查找代码中的未命中处理程序的调用。 生成的程序代码中包含一个未命中处理程序。 在错误处理程序中,根据最近的使用和可预测性来选择要驱逐的行。 在错误处理程序中,针对驱逐行和缺失行发出适当的DMA命令。

    THREAD SPECULATIVE EXECUTION AND ASYNCHRONOUS CONFLICT EVENTS
    3.
    发明申请
    THREAD SPECULATIVE EXECUTION AND ASYNCHRONOUS CONFLICT EVENTS 有权
    螺旋线性执行和异常冲突事件

    公开(公告)号:US20110209154A1

    公开(公告)日:2011-08-25

    申请号:US12711328

    申请日:2010-02-24

    IPC分类号: G06F9/46

    摘要: In an embodiment, asynchronous conflict events are received during a previous rollback period. Each of the asynchronous conflict events represent conflicts encountered by speculative execution of a first plurality of work units and may be received out-of-order. During a current rollback period, a first work unit is determined whose speculative execution raised one of the asynchronous conflict events, and the first work unit is older than all other of the first plurality of work units. A second plurality of work units are determined, whose ages are equal to or older than the first work unit, wherein each of the second plurality of work units are assigned to respective executing threads. Rollbacks of the second plurality of work units are performed. After the rollbacks of the second plurality of work units are performed, speculative executions of the second plurality of work units are initiated in age order, from oldest to youngest.

    摘要翻译: 在一个实施例中,在先前的回滚期间期间接收到异步冲突事件。 每个异步冲突事件表示由第一多个工作单元的推测性执行而遇到的冲突,并且可以被无序地接收。 在当前回滚期间,确定第一工作单元,其推测执行引起异步冲突事件中的一个,并且第一工作单元比第一多个工作单元中的所有其他工作单元老。 确定第二多个作业单元,其年龄等于或小于第一工作单元,其中第二多个作业单元中的每一个分配给相应的执行螺纹。 执行第二多个工作单元的回滚。 在执行第二多个工作单元的回滚之后,第二个多个工作单元的推测性执行以年龄从最早到最小的顺序启动。

    HYBRID MECHANISM FOR MORE EFFICIENT EMULATION AND METHOD THEREFOR
    5.
    发明申请
    HYBRID MECHANISM FOR MORE EFFICIENT EMULATION AND METHOD THEREFOR 有权
    用于更有效地模拟的混合机制及其方法

    公开(公告)号:US20120089820A1

    公开(公告)日:2012-04-12

    申请号:US13311858

    申请日:2011-12-06

    IPC分类号: G06F9/30

    CPC分类号: G06F9/45504

    摘要: In a host system, a method for using instruction scheduling to efficiently emulate the operation of a target computing syste includes preparing, on the host system, an instruction sequence to interpret an instruction written for execution on the target computing system. An instruction scheduling on the instruction sequence is performed, to achieve an efficient instruction level parallelism, for the host system. A separate and independent instruction sequence is inserted, which, when executed simultaneously with the instruction sequence, performs to copy to a separate location a minimum instruction sequence necessary to execute an intent of an interpreted target instruction, the interpreted target instruction being a translation; and modifies the interpreter code such that a next interpretation of the target instruction results in execution of the translated version, thereby removing execution of interpreter overhead.

    摘要翻译: 在主机系统中,使用指令调度来有效地模拟目标计算系统的操作的方法包括在主机系统上准备指令序列以解释为了在目标计算系统上执行而写入的指令。 执行指令序列上的指令调度,以实现主机系统的有效指令级并行性。 插入独立且独立的指令序列,当与指令序列同时执行时,执行将执行解释目标指令的意图所必需的最小指令序列复制到单独的位置,解释的目标指令是转换; 并修改解释器代码,使得目标指令的下一个解释导致翻译版本的执行,从而消除解释器开销的执行。

    Method and apparatus for overlay management within an integrated executable for a heterogeneous architecture
    8.
    发明授权
    Method and apparatus for overlay management within an integrated executable for a heterogeneous architecture 失效
    用于异构架构的集成可执行程序中的覆盖管理的方法和装置

    公开(公告)号:US07222332B2

    公开(公告)日:2007-05-22

    申请号:US10280242

    申请日:2002-10-24

    IPC分类号: G06F9/44 G06F9/30

    CPC分类号: G06F8/453

    摘要: The present invention provides for creating and employing code and data partitions in a heterogeneous environment. This is achieved by separating source code and data into at least two partitioned sections and at least one unpartitioned section. Generally, a partitioned section is targeted for execution on an independent memory device, such as an attached processor unit. Then, at least two overlay sections are generated from at least one partition section. The plurality of partition sections are pre-bound to each other. A root module is also created, associated with both the pre-bound plurality of partitions and the overlay sections. The root module is employable to exchange the at least two overlay sections between the first and second execution environments. The pre-bound plurality of partition sections are then bound to the at least one unpartitioned section. The binding produces an integrated executable.

    摘要翻译: 本发明提供了在异构环境中创建和使用代码和数据分区。 这是通过将源代码和数据分成至少两个分区的部分和至少一个未分区的部分来实现的。 通常,划分的部分被定位在独立的存储设备(例如附加的处理器单元)上执行。 然后,从至少一个分区部分生成至少两个重叠部分。 多个分隔部分彼此预先绑定。 还创建了与预先绑定的多个分区和覆盖部分相关联的根模块。 根模块可用于在第一和第二执行环境之间交换至少两个重叠部分。 然后将预先绑定的多个分割部分绑定到至少一个未分割部分。 绑定产生一个集成的可执行文件。

    System and method for managing position independent code using a software framework
    9.
    发明申请
    System and method for managing position independent code using a software framework 失效
    使用软件框架管理与位置无关的代码的系统和方法

    公开(公告)号:US20060112368A1

    公开(公告)日:2006-05-25

    申请号:US10988288

    申请日:2004-11-12

    IPC分类号: G06F9/44

    CPC分类号: G06F9/44526

    摘要: A system and method for managing position independent code using a software framework is presented. A software framework provides the ability to cache multiple plug-in's which are loaded in a processor's local storage. A processor receives a command or data stream from another processor, which includes information corresponding to a particular plug-in. The processor uses the plug-in identifier to load the plug-in from shared memory into local memory before it is required in order to minimize latency. When the data stream requests the processor to use the plug-in, the processor retrieves a location offset corresponding to the plug-in and applies the plug-in to the data stream. A plug-in manager manages an entry point table that identifies memory locations corresponding to each plug-in and, therefore, plug-ins may be placed anywhere in a processor's local memory.

    摘要翻译: 提出了一种使用软件框架管理与位置无关的代码的系统和方法。 软件框架提供了缓存加载在处理器本地存储中的多个插件的能力。 处理器从另一处理器接收命令或数据流,其包括对应于特定插件的信息。 处理器使用插件标识符在必需之前将插件从共享内存加载到本地内存中,以便最小化延迟。 当数据流请求处理器使用插件时,处理器检索对应于插件的位置偏移并将插件应用于数据流。 插件管理器管理一个入口点表,用于标识与每个插件相对应的存储位置,因此插件可以放置在处理器的本地存储器中的任何位置。

    Executing speculative parallel instructions threads with forking and
inter-thread communication
    10.
    发明授权
    Executing speculative parallel instructions threads with forking and inter-thread communication 失效
    执行带有分叉和线程间通信的推测性并行指令线程

    公开(公告)号:US5812811A

    公开(公告)日:1998-09-22

    申请号:US383331

    申请日:1995-02-03

    IPC分类号: G06F9/30 G06F9/38 G06F9/46

    摘要: A central processing unit (CPU) in a computer that permits speculative parallel execution of more than one instruction thread. The CPU uses Fork-Suspend instructions that are added to the instruction set of the CPU, and are inserted in a program prior to run-time to delineate potential future threads for parallel execution. The CPU has an instruction cache with one or more instruction cache ports, a bank of one or more program counters, a bank of one or more dispatchers, a thread management unit that handles inter-thread communications and discards future threads that violate dependencies, a set of architectural registers common to all threads, and a scheduler that schedules parallel execution of the instructions on one or more functional units in the CPU.

    摘要翻译: 计算机中的中央处理单元(CPU),允许多个指令线程的推测并行执行。 CPU使用被添加到CPU的指令集中的Fork-Suspend指令,并在运行时插入到程序中,以描绘未来可能的并行线程。 CPU具有指令高速缓存,其具有一个或多个指令高速缓存端口,一个或多个程序计数器的存储体,一个或多个调度器的存储体,处理线程间通信的线程管理单元,并丢弃违反相关性的未来线程, 所有线程通用的一组架构寄存器,以及一个在CPU中的一个或多个功能单元上并行执行指令的调度程序。