Compiler implemented software cache in which non-aliased explicitly fetched data are excluded
    2.
    发明授权
    Compiler implemented software cache in which non-aliased explicitly fetched data are excluded 有权
    编译器实现的软件缓存中排除了非别名显式读取的数据

    公开(公告)号:US08214816B2

    公开(公告)日:2012-07-03

    申请号:US12128194

    申请日:2008-05-28

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: A compiler implemented software cache in which non-aliased explicitly fetched data are excluded are provided. With the mechanisms of the illustrative embodiments, a compiler uses a forward data flow analysis to prove that there is no alias between the cached data and explicitly fetched data. Explicitly fetched data that has no alias in the cached data are excluded from the software cache. Explicitly fetched data that has aliases in the cached data are allowed to be stored in the software cache. In this way, there is no runtime overhead to maintain the correctness of the two copies of data. Moreover, the number of lines of the software cache that must be protected from eviction is decreased. This leads to a decrease in the amount of computation cycles required by the cache miss handler when evicting cache lines during cache miss handling.

    摘要翻译: 提供了一种编译器实现的软件缓存,其中排除了非别名显式读取的数据。 利用说明性实施例的机制,编译器使用前向数据流分析来证明在缓存的数据和显式提取的数据之间没有别名。 在缓存数据中没有别名的显式获取的数据将从软件高速缓存中排除。 在缓存数据中具有别名的明确获取的数据被允许存储在软件高速缓存中。 以这种方式,没有运行时开销来维护两个数据副本的正确性。 此外,必须防止驱逐的软件缓存的行数减少。 这导致在高速缓存未命中处理期间驱逐高速缓存行时缓存未命中处理程序所需的计算周期量的减少。

    Performing useful computations while waiting for a line in a system with a software implemented cache
    3.
    发明授权
    Performing useful computations while waiting for a line in a system with a software implemented cache 失效
    在使用软件实现的缓存在系统中等待线路时执行有用的计算

    公开(公告)号:US07765360B2

    公开(公告)日:2010-07-27

    申请号:US12243339

    申请日:2008-10-01

    摘要: Mechanisms for performing useful computations during a software cache reload operation are provided. With the illustrative embodiments, in order to perform software caching, a compiler takes original source code, and while compiling the source code, inserts explicit cache lookup instructions into appropriate portions of the source code where cacheable variables are referenced. In addition, the compiler inserts a cache miss handler routine that is used to branch execution of the code to a cache miss handler if the cache lookup instructions result in a cache miss. The cache miss handler, prior to performing a wait operation for waiting for the data to be retrieved from the backing store, branches execution to an independent subroutine identified by a compiler. The independent subroutine is executed while the data is being retrieved from the backing store such that useful work is performed.

    摘要翻译: 提供了在软件缓存重新加载操作期间执行有用计算的机制。 利用说明性实施例,为了执行软件缓存,编译器采用原始源代码,并且在编译源代码的同时,将明确的高速缓存查找指令插入到引用可缓存变量的源代码的适当部分中。 另外,如果缓存查找指令导致高速缓存未命中,则编译器插入用于将代码的执行分支到高速缓存未命中处理程序的高速缓存未命中处理程序例程。 缓存未命中处理程序在执行等待从后备存储器检索的数据的等待操作之前,将执行分支到编译器识别的独立子程序中。 在从后备存储器检索数据时执行独立子程序,从而执行有用的工作。

    Compiler Method for Employing Multiple Autonomous Synergistic Processors to Simultaneously Operate on Longer Vectors of Data
    4.
    发明申请
    Compiler Method for Employing Multiple Autonomous Synergistic Processors to Simultaneously Operate on Longer Vectors of Data 有权
    使用多个自治协同处理器同时在较长的数据载体上操作的编译器方法

    公开(公告)号:US20080229298A1

    公开(公告)日:2008-09-18

    申请号:US11686400

    申请日:2007-03-15

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

    摘要翻译: 编译器包括使用多个协同处理器执行长向量的机制。 编译器接收单个源程序。 编译器在单个源程序中识别可矢量化的循环代码,并从单个源程序中提取可向量循环代码。 编译器然后编译用于多个协同处理器的提取的可矢量化循环码。 编译器还编译用于主处理器的单个源程序的剩余部分以形成可执行主程序,使得可执行主程序控制多个协同处理器上的可执行向量化循环代码的操作。

    System and Method to Efficiently Prefetch and Batch Compiler-Assisted Software Cache Accesses
    6.
    发明申请
    System and Method to Efficiently Prefetch and Batch Compiler-Assisted Software Cache Accesses 失效
    有效预取和批量编译器辅助软件缓存访问的系统和方法

    公开(公告)号:US20080046657A1

    公开(公告)日:2008-02-21

    申请号:US11465522

    申请日:2006-08-18

    IPC分类号: G06F12/00

    摘要: A system and method to efficiently pre-fetch and batch compiler-assisted software cache accesses are provided. The system and method reduce the overhead associated with software cache directory accesses. With the system and method, the local memory address of the cache line that stores the pre-fetched data is itself cached, such as in a register or well known location in local memory, so that a later data access does not need to perform address translation and software cache operations and can instead access the data directly from the software cache using the cached local memory address. This saves processor cycles that would otherwise be required to perform the address translation a second time when the data is to be used. Moreover, the system and method directly enable software cache accesses to be effectively decoupled from address translation in order to increase the overlap between computation and communication.

    摘要翻译: 提供了一种有效预取和批量编译器辅助的软件高速缓存访​​问的系统和方法。 系统和方法减少与软件缓存目录访问相关的开销。 使用系统和方法,存储预取数据的高速缓存行的本地存储器地址本身被缓存,例如在本地存储器中的寄存器或公知位置中,使得稍后的数据访问不需要执行地址 翻译和软件缓存操作,并且可以使用缓存的本地存储器地址直接从软件缓存访问数据。 这节省了当使用数据时第二次执行地址转换所需的处理器周期。 此外,系统和方法直接使得软件高速缓存访​​问能够有效地从地址转换中解耦,以增加计算和通信之间的重叠。

    Partitioning programs between a general purpose core and one or more accelerators
    7.
    发明授权
    Partitioning programs between a general purpose core and one or more accelerators 失效
    通用核心和一个或多个加速器之间的分区程序

    公开(公告)号:US08375374B2

    公开(公告)日:2013-02-12

    申请号:US12127395

    申请日:2008-05-27

    IPC分类号: G06F9/45

    CPC分类号: G06F8/45 G06F8/451 G06F8/456

    摘要: An mechanism is provided for partitioning programs between a general purpose core and one or more accelerators. With the apparatus and method, a compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.

    摘要翻译: 提供了一种用于在通用核心和一个或多个加速器之间划分程序的机制。 利用该装置和方法,提供了一种编译器前端,用于将相应高级编程语言中的程序源代码转换为中间代码表示。 该中间代码表示被提供给过程间优化器,其确定程序的每个部分应执行哪个核心处理器或加速器,并且基于该组决策将程序分割成子程序。 过程间优化器可以进一步向分区添加指令以根据需要协调和同步子程序。 每个子程序被编译在用于执行子程序的特定核心处理器或加速器的指令集架构的适当编译器后端上。 编译的子程序然后链接从而生成可执行程序。

    Computer program code size partitioning system for multiple memory multi-processing systems
    8.
    发明授权
    Computer program code size partitioning system for multiple memory multi-processing systems 失效
    用于多个存储器多处理系统的计算机程序代码分配系统

    公开(公告)号:US08032873B2

    公开(公告)日:2011-10-04

    申请号:US12337197

    申请日:2008-12-17

    IPC分类号: G06F9/45 G06F9/46

    摘要: The present invention provides for a system for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.

    摘要翻译: 本发明提供了一种用于多存储器多处理器系统的计算机程序代码大小划分的系统。 识别包括一个或多个不同处理节点的计算机系统的至少一个系统参数。 接收包括要在计算机系统上运行的程序的计算机程序代码。 生成基于所接收的计算机程序代码的程序表示。 基于整个程序表示来识别至少一个单入口单出口(SESE)区域。 基于所识别的SESE区域和至少一个系统参数来识别小于一定大小(存储大小特定)的至少一个SESE区域。 每个存储大小特定的SESE区域被分组为特定于节点的子例程。 计算机程序代码的非节点特定部分是基于划分到特定于节点的子例程中进行修改的。 基于指定的节点特性编译包括每个特定于节点的子例程的修改的计算机程序代码。

    Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data
    9.
    发明授权
    Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data 有权
    使用多个自主协同处理器同时对较长的数据向量进行编译的方法

    公开(公告)号:US07962906B2

    公开(公告)日:2011-06-14

    申请号:US11686400

    申请日:2007-03-15

    IPC分类号: G06F9/45 G06F15/76

    CPC分类号: G06F8/456

    摘要: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

    摘要翻译: 编译器包括使用多个协同处理器执行长向量的机制。 编译器接收单个源程序。 编译器在单个源程序中识别可矢量化的循环代码,并从单个源程序中提取可向量循环代码。 编译器然后编译用于多个协同处理器的提取的可矢量化循环码。 编译器还编译用于主处理器的单个源程序的剩余部分以形成可执行主程序,使得可执行主程序控制多个协同处理器上的可执行向量化循环代码的操作。

    Performing Useful Computations While Waiting for a Line in a System with a Software Implemented Cache
    10.
    发明申请
    Performing Useful Computations While Waiting for a Line in a System with a Software Implemented Cache 失效
    在具有软件实现的缓存的系统中等待线路时执行有用的计算

    公开(公告)号:US20090055588A1

    公开(公告)日:2009-02-26

    申请号:US12243339

    申请日:2008-10-01

    IPC分类号: G06F12/08

    摘要: Mechanisms for performing useful computations during a software cache reload operation are provided. With the illustrative embodiments, in order to perform software caching, a compiler takes original source code, and while compiling the source code, inserts explicit cache lookup instructions into appropriate portions of the source code where cacheable variables are referenced. In addition, the compiler inserts a cache miss handler routine that is used to branch execution of the code to a cache miss handler if the cache lookup instructions result in a cache miss. The cache miss handler, prior to performing a wait operation for waiting for the data to be retrieved from the backing store, branches execution to an independent subroutine identified by a compiler. The independent subroutine is executed while the data is being retrieved from the backing store such that useful work is performed.

    摘要翻译: 提供了在软件缓存重新加载操作期间执行有用计算的机制。 利用说明性实施例,为了执行软件缓存,编译器采用原始源代码,并且在编译源代码的同时,将明确的高速缓存查找指令插入到引用可缓存变量的源代码的适当部分中。 另外,如果缓存查找指令导致高速缓存未命中,则编译器插入用于将代码的执行分支到高速缓存未命中处理程序的高速缓存未命中处理程序例程。 缓存未命中处理程序在执行等待从后备存储器检索的数据的等待操作之前,将执行分支到编译器识别的独立子程序中。 在从后备存储器检索数据时执行独立子程序,从而执行有用的工作。