专利检索 ap:("Roch Georges Archambault" OR "Shimin Cui" OR "Yaoqing Gao" OR "Raul Esteban Silvera" OR "Peng Zhao") AND inv:"Roch Georges Archambault" 第 2 页

11.

发明授权
Method and system for reducing memory reference overhead associated with threadprivate variables in parallel programs 失效
标题翻译：用于减少并行程序中与线程私有变量相关联的内存引用开销的方法和系统

公开(公告)号：US07590977B2

公开(公告)日：2009-09-15

申请号：US11250833

申请日：2005-10-13

申请人： Roch Georges Archambault , Shimin Cui

发明人： Roch Georges Archambault , Shimin Cui

IPC分类号： G06F9/45

CPC分类号： G06F8/445 , G06F8/443 , G06F8/453

摘要： A computer implemented method, system and computer program product for accessing threadprivate memory for threadprivate variables in a parallel program during program compilation. A computer implemented method for accessing threadprivate variables in a parallel program during program compilation includes aggregating threadprivate variables in the program, replacing references of the threadprivate variables by indirect references, moving address load operations of the threadprivate variables, and replacing the address load operations of the threadprivate variables by calls to runtime routines to access the threadprivate memory. The invention enables a compiler to minimize the runtime routines call times to access the threadprivate variables, thus improving program performance.

摘要翻译： 一种计算机实现的方法，系统和计算机程序产品，用于在程序编译期间在并行程序中访问线程私有变量的线程私有存储器。在程序编译过程中，一种用于在并行程序中访问线程私有变量的计算机实现方法包括在程序中聚合线程私有变量，通过间接引用替代线程私有变量的引用，移动线程私有变量的地址加载操作，以及替换 threadprivate变量通过调用运行时程序访问线程私有内存。本发明使得编译器能够最小化运行时程序调用时间以访问线程私有变量，从而提高程序性能。

12.

发明授权
Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations 失效
标题翻译：使用集成的高级和低级代码分析优化进行细粒度的软件导向数据预取

公开(公告)号：US07669194B2

公开(公告)日：2010-02-23

申请号：US10926595

申请日：2004-08-26

申请人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

发明人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

IPC分类号： G06F9/44 , G06F9/45 , G06F9/30

CPC分类号： G06F8/4442

摘要： A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.

摘要翻译： 提供了一种通过使用集成高级和低级代码分析和优化的细粒度软件导向数据预取来最小化有效存储器延迟而不需要成本的机制。该机制识别和分类流，识别最可能引起缓存未命中的数据，利用有效的硬件预取来确定要预取的流的适当数量，利用不同类型的流上的有效数据预取，以消除冗余预取和避免高速缓存污染，并在指令调度程序中使用集成较低级别成本分析的高级转换，有效地调度预取指令。

13.

发明授权
Method and system for code modification based on cache structure 失效
标题翻译：基于缓存结构的代码修改方法和系统

公开(公告)号：US07530063B2

公开(公告)日：2009-05-05

申请号：US10855729

申请日：2004-05-27

申请人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , John David McCalpin , Francis Patrick O'Connell , Pascal Vezolle , Steven Wayne White

发明人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , John David McCalpin , Francis Patrick O'Connell , Pascal Vezolle , Steven Wayne White

IPC分类号： G06F9/45

CPC分类号： G06F8/4442

摘要： A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.

摘要翻译： 提供了修改形成循环的指令的方法和系统。修改形成循环的指令的方法包括修改形成循环的指令，包括：确定指令的静态和动态特性; 基于在正在处理所述指令的处理器中形成高速缓存的单独的等效部分的数量来选择所述指令的修改因子; 以及当指令满足基于静态和动态特性的修改标准时，修改指令以根据修改因子和静态和动态特性来交织循环中的指令。

14.

发明申请
Fine-Grained Software-Directed Data Prefetching Using Integrated High-Level and Low-Level Code Analysis Optimizations 有权
标题翻译：使用集成的高级和低级代码分析优化进行细粒度软件定向数据预取

公开(公告)号：US20100095271A1

公开(公告)日：2010-04-15

申请号：US12644756

申请日：2009-12-22

申请人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

发明人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

IPC分类号： G06F9/44

CPC分类号： G06F8/4442

摘要： A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.

摘要翻译： 提供了一种通过使用集成高级和低级代码分析和优化的细粒度软件导向数据预取来最小化有效存储器延迟而不需要成本的机制。该机制识别和分类流，识别最可能引起缓存未命中的数据，利用有效的硬件预取来确定要预取的流的适当数量，利用不同类型的流上的有效数据预取，以消除冗余预取和避免高速缓存污染，并在指令调度程序中使用集成较低级别成本分析的高级转换，有效地调度预取指令。

15.

发明申请
Sparse vectorization without hardware gather / scatter 失效
标题翻译：稀疏矢量化无硬件收集/散射

公开(公告)号：US20080092125A1

公开(公告)日：2008-04-17

申请号：US11549172

申请日：2006-10-13

申请人： Roch Georges Archambault , George Chochia , Peng Zhao

发明人： Roch Georges Archambault , George Chochia , Peng Zhao

IPC分类号： G06F9/45

CPC分类号： G06F8/447

摘要： A target operation in a normalized target loop, susceptible of vectorization and which may, after compilation into a vectorized form, seek to operate on data in nonconsecutive physical memory, is identified in source code. Hardware instructions are inserted into executable code generated from the source code, directing a system that will run the executable code to create a representation of the data in consecutive physical memory. A vector loop containing the target operation is replaced, in the executable code, with a function call to a vector library to call a vector function that will operate on the representation to generate a result identical to output expected from executing the vector loop containing the target operation. On execution, a representation of data residing in nonconsecutive physical memory is created in consecutive physical memory, and the vectorized target operation is applied to the representation to process the data.

摘要翻译： 标准化目标循环中的目标操作，易于向量化，并且可以在编译成向量化形式之后寻求对非连续物理存储器中的数据进行操作，在源代码中被识别。硬件指令被插入到从源代码生成的可执行代码中，指示将运行可执行代码的系统在连续的物理内存中创建数据的表示。包含目标操作的向量循环在可执行代码中被替换为对向量库的函数调用，以调用将在表示上操作的向量函数，以生成与执行包含目标的向量循环所期望的输出相同的结果操作。在执行时，在连续物理存储器中创建驻留在非连续物理存储器中的数据的表示，并且向量化的目标操作被应用于表示以处理数据。

16.

发明授权
Sparse vectorization without hardware gather/scatter 失效
标题翻译：稀疏矢量化无硬件收集/散射

公开(公告)号：US08191056B2

公开(公告)日：2012-05-29

申请号：US11549172

申请日：2006-10-13

申请人： Roch Georges Archambault , George Chochia , Peng Zhao

发明人： Roch Georges Archambault , George Chochia , Peng Zhao

IPC分类号： G06F9/45

CPC分类号： G06F8/447

摘要： A target operation in a normalized target loop, susceptible of vectorization and which may, after compilation into a vectorized form, seek to operate on data in nonconsecutive physical memory, is identified in source code. Hardware instructions are inserted into executable code generated from the source code, directing a system that will run the executable code to create a representation of the data in consecutive physical memory. A vector loop containing the target operation is replaced, in the executable code, with a function call to a vector library to call a vector function that will operate on the representation to generate a result identical to output expected from executing the vector loop containing the target operation. On execution, a representation of data residing in nonconsecutive physical memory is created in consecutive physical memory, and the vectorized target operation is applied to the representation to process the data.

摘要翻译： 标准化目标循环中的目标操作，易于向量化，并且可以在编译成向量化形式之后寻求对非连续物理存储器中的数据进行操作，在源代码中被识别。硬件指令被插入到从源代码生成的可执行代码中，指示将运行可执行代码的系统在连续的物理内存中创建数据的表示。包含目标操作的向量循环在可执行代码中被替换为对向量库的函数调用，以调用将在表示上操作的向量函数，以生成与执行包含目标的向量循环所期望的输出相同的结果操作。在执行时，在连续物理存储器中创建驻留在非连续物理存储器中的数据的表示，并且向量化的目标操作被应用于表示以处理数据。

17.

发明申请
Code generation for complex arithmetic reduction for architectures lacking cross data-path support 有权
标题翻译：针对缺乏跨数据路径支持的架构的复杂算术减少的代码生成

公开(公告)号：US20080092124A1

公开(公告)日：2008-04-17

申请号：US11548851

申请日：2006-10-12

申请人： Roch Georges Archambault , Alexandre E. Eichenberger , Amy Kai-Ting Wang , Peng Wu , Peng Zhao

发明人： Roch Georges Archambault , Alexandre E. Eichenberger , Amy Kai-Ting Wang , Peng Wu , Peng Zhao

IPC分类号： G06F9/45

CPC分类号： G06F8/445 , G06F8/45

摘要： A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.

摘要翻译： 一种计算机实现的方法，装置和计算机可用程序代码，用于编译用于执行复杂操作的复杂缩减操作的源代码。确定用于生成用于执行复杂操作和复合缩减操作的可执行代码的方法。生成用于计算子产品的可执行代码，将子产品减少到中间结果，并且对中间结果求和以响应于减少的单指令多数据方法的确定而产生最终结果。

18.

发明授权
Code generation for complex arithmetic reduction for architectures lacking cross data-path support 有权
标题翻译：针对缺乏跨数据路径支持的架构的复杂算术减少的代码生成

公开(公告)号：US08423979B2

公开(公告)日：2013-04-16

申请号：US11548851

申请日：2006-10-12

申请人： Roch Georges Archambault , Alexandre E. Eichenberger , Amy Kai-Ting Wang , Peng Wu , Peng P. Zhao

发明人： Roch Georges Archambault , Alexandre E. Eichenberger , Amy Kai-Ting Wang , Peng Wu , Peng P. Zhao

IPC分类号： G06F9/45

CPC分类号： G06F8/445 , G06F8/45

摘要： A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.

摘要翻译： 一种计算机实现的方法，装置和计算机可用程序代码，用于编译用于执行复杂操作的复杂缩减操作的源代码。确定用于生成用于执行复杂操作和复合缩减操作的可执行代码的方法。生成用于计算子产品的可执行代码，将子产品减少到中间结果，并且对中间结果求和以响应于减少的单指令多数据方法的确定而产生最终结果。

19.

发明授权
Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer 失效
标题翻译：通过使用插入缓存到缓存数据传输的重置指令来管理带宽

公开(公告)号：US07168070B2

公开(公告)日：2007-01-23

申请号：US10853304

申请日：2004-05-25

申请人： Roch Georges Archambault , Robert James Blainey , Yaoging Gao , Randall Ray Heisch , Steven Wayne White

发明人： Roch Georges Archambault , Robert James Blainey , Yaoging Gao , Randall Ray Heisch , Steven Wayne White

IPC分类号： G06F9/45 , G06F13/00

CPC分类号： G06F12/0833 , G06F9/30047 , G06F9/3455 , G06F9/383

摘要： A method and system for reducing or avoiding store misses with a data cache block zero (DCBZ) instruction in cooperation with the underlying hardware load stream prefetching support for helping to increase effective aggregate bandwith. The method identifies and classifies unique streams in a loop based on dependency and reuse analysis, and performs loop transformations, such as node splitting, loop distribution or stream unrolling to get the proper number of streams. Static prediction and run-time profile information are used to guide loop and stream selection. Compile-time loop cost analysis and run-time check code and versioning are used to determine the number of cache lines ahead of each reference for data cache line zeroing and to tolerate required data alignment relative to data cache lines.

摘要翻译： 与底层硬件负载流预取支持协作，通过数据缓存块零（DCBZ）指令减少或避免存储错误的方法和系统，以帮助增加有效的聚合带宽。该方法基于依赖和重用分析在循环中识别和分类唯一流，并执行循环转换，例如节点分割，循环分布或流展开以获得适当数量的流。静态预测和运行时间轮廓信息用于指导循环和流选择。编译时循环成本分析和运行时检查代码和版本控制用于确定数据高速缓存行归零的每个引用之前的高速缓存行数，并允许相对于数据高速缓存行的所需数据对齐。

20.

发明授权
Method and apparatus for determining the profitability of expanding unpipelined instructions 失效
标题翻译：用于确定扩展无通知指令的盈利能力的方法和装置

公开(公告)号：US07506331B2

公开(公告)日：2009-03-17

申请号：US10930042

申请日：2004-08-30

申请人： Roch Georges Archambault , Robert Frederick Enenkel , Robert William Hay , Allan Russell Martin , James Lawrence McInnes , Ronald Ian McIntosh , Mark Peter Mendell

发明人： Roch Georges Archambault , Robert Frederick Enenkel , Robert William Hay , Allan Russell Martin , James Lawrence McInnes , Ronald Ian McIntosh , Mark Peter Mendell

IPC分类号： G06F9/45

CPC分类号： G06F8/443

摘要： A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.

摘要翻译： 一种用于处理指令的方法，装置和计算机指令。构建数据依赖图。分析数据依赖关系图以进行复现，扩展位于复发之外的无关注指令。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类