Method for partitioning programs between a general purpose core and one or more accelerators
    1.
    发明授权
    Method for partitioning programs between a general purpose core and one or more accelerators 有权
    用于在通用内核和一个或多个加速器之间划分程序的方法

    公开(公告)号:US09038040B2

    公开(公告)日:2015-05-19

    申请号:US11339592

    申请日:2006-01-25

    IPC分类号: G06F9/45

    CPC分类号: G06F8/45 G06F8/451 G06F8/456

    摘要: Partitioning programs between a general purpose core and one or more accelerators is provided. A compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.

    摘要翻译: 提供通用核心和一个或多个加速器之间的分区程序。 提供了一种编译器前端,用于将相应高级编程语言中的程序源代码转换为中间代码表示。 该中间代码表示被提供给过程间优化器,其确定程序的每个部分应执行哪个核心处理器或加速器,并且基于该组决定将程序分割成子程序。 过程间优化器可以进一步向分区添加指令以根据需要协调和同步子程序。 每个子程序被编译在用于执行子程序的特定核心处理器或加速器的指令集架构的适当编译器后端上。 编译的子程序然后链接从而生成可执行程序。

    Partitioning programs between a general purpose core and one or more accelerators
    2.
    发明授权
    Partitioning programs between a general purpose core and one or more accelerators 失效
    通用核心和一个或多个加速器之间的分区程序

    公开(公告)号:US08375374B2

    公开(公告)日:2013-02-12

    申请号:US12127395

    申请日:2008-05-27

    IPC分类号: G06F9/45

    CPC分类号: G06F8/45 G06F8/451 G06F8/456

    摘要: An mechanism is provided for partitioning programs between a general purpose core and one or more accelerators. With the apparatus and method, a compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.

    摘要翻译: 提供了一种用于在通用核心和一个或多个加速器之间划分程序的机制。 利用该装置和方法,提供了一种编译器前端,用于将相应高级编程语言中的程序源代码转换为中间代码表示。 该中间代码表示被提供给过程间优化器,其确定程序的每个部分应执行哪个核心处理器或加速器,并且基于该组决策将程序分割成子程序。 过程间优化器可以进一步向分区添加指令以根据需要协调和同步子程序。 每个子程序被编译在用于执行子程序的特定核心处理器或加速器的指令集架构的适当编译器后端上。 编译的子程序然后链接从而生成可执行程序。

    Apparatus and Method for Partitioning Programs Between a General Purpose Core and One or More Accelerators
    3.
    发明申请
    Apparatus and Method for Partitioning Programs Between a General Purpose Core and One or More Accelerators 失效
    在通用核心和一个或多个加速器之间分配程序的装置和方法

    公开(公告)号:US20080256521A1

    公开(公告)日:2008-10-16

    申请号:US12127395

    申请日:2008-05-27

    IPC分类号: G06F9/44

    CPC分类号: G06F8/45 G06F8/451 G06F8/456

    摘要: An apparatus and method for partitioning programs between a general purpose core and one or more accelerators are provided. With the apparatus and method, a compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.

    摘要翻译: 提供了用于在通用内核和一个或多个加速器之间分配程序的装置和方法。 利用该装置和方法,提供了一种编译器前端,用于将相应高级编程语言中的程序源代码转换为中间代码表示。 该中间代码表示被提供给过程间优化器,其确定程序的每个部分应执行哪个核心处理器或加速器,并且基于该组决定将程序分割成子程序。 过程间优化器可以进一步向分区添加指令以根据需要协调和同步子程序。 每个子程序被编译在用于执行子程序的特定核心处理器或加速器的指令集架构的适当编译器后端上。 编译的子程序然后链接从而生成可执行程序。

    Compiler Method for Employing Multiple Autonomous Synergistic Processors to Simultaneously Operate on Longer Vectors of Data
    4.
    发明申请
    Compiler Method for Employing Multiple Autonomous Synergistic Processors to Simultaneously Operate on Longer Vectors of Data 有权
    使用多个自治协同处理器同时在较长的数据载体上操作的编译器方法

    公开(公告)号:US20080229298A1

    公开(公告)日:2008-09-18

    申请号:US11686400

    申请日:2007-03-15

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

    摘要翻译: 编译器包括使用多个协同处理器执行长向量的机制。 编译器接收单个源程序。 编译器在单个源程序中识别可矢量化的循环代码,并从单个源程序中提取可向量循环代码。 编译器然后编译用于多个协同处理器的提取的可矢量化循环码。 编译器还编译用于主处理器的单个源程序的剩余部分以形成可执行主程序,使得可执行主程序控制多个协同处理器上的可执行向量化循环代码的操作。

    System and Method to Efficiently Prefetch and Batch Compiler-Assisted Software Cache Accesses
    6.
    发明申请
    System and Method to Efficiently Prefetch and Batch Compiler-Assisted Software Cache Accesses 失效
    有效预取和批量编译器辅助软件缓存访问的系统和方法

    公开(公告)号:US20080046657A1

    公开(公告)日:2008-02-21

    申请号:US11465522

    申请日:2006-08-18

    IPC分类号: G06F12/00

    摘要: A system and method to efficiently pre-fetch and batch compiler-assisted software cache accesses are provided. The system and method reduce the overhead associated with software cache directory accesses. With the system and method, the local memory address of the cache line that stores the pre-fetched data is itself cached, such as in a register or well known location in local memory, so that a later data access does not need to perform address translation and software cache operations and can instead access the data directly from the software cache using the cached local memory address. This saves processor cycles that would otherwise be required to perform the address translation a second time when the data is to be used. Moreover, the system and method directly enable software cache accesses to be effectively decoupled from address translation in order to increase the overlap between computation and communication.

    摘要翻译: 提供了一种有效预取和批量编译器辅助的软件高速缓存访​​问的系统和方法。 系统和方法减少与软件缓存目录访问相关的开销。 使用系统和方法,存储预取数据的高速缓存行的本地存储器地址本身被缓存,例如在本地存储器中的寄存器或公知位置中,使得稍后的数据访问不需要执行地址 翻译和软件缓存操作,并且可以使用缓存的本地存储器地址直接从软件缓存访问数据。 这节省了当使用数据时第二次执行地址转换所需的处理器周期。 此外,系统和方法直接使得软件高速缓存访​​问能够有效地从地址转换中解耦,以增加计算和通信之间的重叠。

    Computer program code size partitioning system for multiple memory multi-processing systems
    7.
    发明授权
    Computer program code size partitioning system for multiple memory multi-processing systems 失效
    用于多个存储器多处理系统的计算机程序代码分配系统

    公开(公告)号:US08032873B2

    公开(公告)日:2011-10-04

    申请号:US12337197

    申请日:2008-12-17

    IPC分类号: G06F9/45 G06F9/46

    摘要: The present invention provides for a system for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.

    摘要翻译: 本发明提供了一种用于多存储器多处理器系统的计算机程序代码大小划分的系统。 识别包括一个或多个不同处理节点的计算机系统的至少一个系统参数。 接收包括要在计算机系统上运行的程序的计算机程序代码。 生成基于所接收的计算机程序代码的程序表示。 基于整个程序表示来识别至少一个单入口单出口(SESE)区域。 基于所识别的SESE区域和至少一个系统参数来识别小于一定大小(存储大小特定)的至少一个SESE区域。 每个存储大小特定的SESE区域被分组为特定于节点的子例程。 计算机程序代码的非节点特定部分是基于划分到特定于节点的子例程中进行修改的。 基于指定的节点特性编译包括每个特定于节点的子例程的修改的计算机程序代码。

    Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data
    8.
    发明授权
    Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data 有权
    使用多个自主协同处理器同时对较长的数据向量进行编译的方法

    公开(公告)号:US07962906B2

    公开(公告)日:2011-06-14

    申请号:US11686400

    申请日:2007-03-15

    IPC分类号: G06F9/45 G06F15/76

    CPC分类号: G06F8/456

    摘要: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

    摘要翻译: 编译器包括使用多个协同处理器执行长向量的机制。 编译器接收单个源程序。 编译器在单个源程序中识别可矢量化的循环代码,并从单个源程序中提取可向量循环代码。 编译器然后编译用于多个协同处理器的提取的可矢量化循环码。 编译器还编译用于主处理器的单个源程序的剩余部分以形成可执行主程序,使得可执行主程序控制多个协同处理器上的可执行向量化循环代码的操作。

    Method to efficiently prefetch and batch compiler-assisted software cache accesses
    9.
    发明授权
    Method to efficiently prefetch and batch compiler-assisted software cache accesses 失效
    有效预取和批量编译器辅助软件缓存访问的方法

    公开(公告)号:US07493452B2

    公开(公告)日:2009-02-17

    申请号:US11465522

    申请日:2006-08-18

    IPC分类号: G06F12/00

    摘要: A method to efficiently pre-fetch and batch compiler-assisted software cache accesses is provided. The method reduces the overhead associated with software cache directory accesses. With the method, the local memory address of the cache line that stores the pre-fetched data is itself cached, such as in a register or well known location in local memory, so that a later data access does not need to perform address translation and software cache operations and can instead access the data directly from the software cache using the cached local memory address. This saves processor cycles that would otherwise be required to perform the address translation a second time when the data is to be used. Moreover, the system and method directly enable software cache accesses to be effectively decoupled from address translation in order to increase the overlap between computation and communication.

    摘要翻译: 提供了一种有效预取和批量编译器辅助的软件高速缓存访​​问的方法。 该方法减少与软件缓存目录访问相关的开销。 使用该方法,存储预取数据的高速缓存行的本地存储器地址本身被缓存,例如在本地存储器中的寄存器或公知位置中,使得稍后的数据访问不需要执行地址转换, 软件缓存操作,可以使用缓存的本地内存地址直接从软件缓存访问数据。 这节省了当使用数据时第二次执行地址转换所需的处理器周期。 此外,系统和方法直接使得软件高速缓存访​​问能够有效地从地址转换中解耦,以增加计算和通信之间的重叠。

    Compiler Method for Eliminating Redundant Read-Modify-Write Code Sequences in Non-Vectorizable Code
    10.
    发明申请
    Compiler Method for Eliminating Redundant Read-Modify-Write Code Sequences in Non-Vectorizable Code 失效
    用于消除非可向量化代码中的冗余读 - 修改 - 写代码序列的编译器方法

    公开(公告)号:US20080052688A1

    公开(公告)日:2008-02-28

    申请号:US11461571

    申请日:2006-08-01

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443

    摘要: A computer implemented method, apparatus, and computer usable program code for eliminating redundant read-modify-write code sequences in non-vectorizable code. Code is received comprising a sequence of operations. The sequence of operations includes a loop. Non-vectorizable operations are identified within the loop that modifies at least one sub-part of a storage location. The non-vectorizable operations are modified to include a single store operation for the number of sub-parts of the storage location.

    摘要翻译: 一种用于消除非可向量化代码中的冗余读 - 修改 - 写代码序列的计算机实现的方法,装置和计算机可用程序代码。 接收到的代码包括一系列操作。 操作顺序包括循环。 在循环中识别不可矢量化的操作,其修改存储位置的至少一个子部分。 不可矢量化的操作被修改为包括用于存储位置的子部件的数量的单个存储操作。