专利检索 ap:("Roch Georges Archambault" OR "Robert James Blainey" OR "Yaoqing Gao" OR "Allan Russell Martin" OR "James Lawrence McInnes" OR "Francis Patrick O'Connell") AND inv:"Roch Georges Archambault" 第 1 页

1.

发明授权
Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations 失效
标题翻译：使用集成的高级和低级代码分析优化进行细粒度的软件导向数据预取

公开(公告)号：US07669194B2

公开(公告)日：2010-02-23

申请号：US10926595

申请日：2004-08-26

申请人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

发明人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

IPC分类号： G06F9/44 , G06F9/45 , G06F9/30

CPC分类号： G06F8/4442

摘要： A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.

摘要翻译： 提供了一种通过使用集成高级和低级代码分析和优化的细粒度软件导向数据预取来最小化有效存储器延迟而不需要成本的机制。该机制识别和分类流，识别最可能引起缓存未命中的数据，利用有效的硬件预取来确定要预取的流的适当数量，利用不同类型的流上的有效数据预取，以消除冗余预取和避免高速缓存污染，并在指令调度程序中使用集成较低级别成本分析的高级转换，有效地调度预取指令。

2.

发明申请
Fine-Grained Software-Directed Data Prefetching Using Integrated High-Level and Low-Level Code Analysis Optimizations 有权
标题翻译：使用集成的高级和低级代码分析优化进行细粒度软件定向数据预取

公开(公告)号：US20100095271A1

公开(公告)日：2010-04-15

申请号：US12644756

申请日：2009-12-22

申请人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

发明人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , Allan Russell Martin , James Lawrence McInnes , Francis Patrick O'Connell

IPC分类号： G06F9/44

CPC分类号： G06F8/4442

摘要： A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.

摘要翻译： 提供了一种通过使用集成高级和低级代码分析和优化的细粒度软件导向数据预取来最小化有效存储器延迟而不需要成本的机制。该机制识别和分类流，识别最可能引起缓存未命中的数据，利用有效的硬件预取来确定要预取的流的适当数量，利用不同类型的流上的有效数据预取，以消除冗余预取和避免高速缓存污染，并在指令调度程序中使用集成较低级别成本分析的高级转换，有效地调度预取指令。

3.

发明授权
Method and system for code modification based on cache structure 失效
标题翻译：基于缓存结构的代码修改方法和系统

公开(公告)号：US07530063B2

公开(公告)日：2009-05-05

申请号：US10855729

申请日：2004-05-27

申请人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , John David McCalpin , Francis Patrick O'Connell , Pascal Vezolle , Steven Wayne White

发明人： Roch Georges Archambault , Robert James Blainey , Yaoqing Gao , John David McCalpin , Francis Patrick O'Connell , Pascal Vezolle , Steven Wayne White

IPC分类号： G06F9/45

CPC分类号： G06F8/4442

摘要： A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.

摘要翻译： 提供了修改形成循环的指令的方法和系统。修改形成循环的指令的方法包括修改形成循环的指令，包括：确定指令的静态和动态特性; 基于在正在处理所述指令的处理器中形成高速缓存的单独的等效部分的数量来选择所述指令的修改因子; 以及当指令满足基于静态和动态特性的修改标准时，修改指令以根据修改因子和静态和动态特性来交织循环中的指令。

4.

发明授权
Method and apparatus for determining the profitability of expanding unpipelined instructions 失效
标题翻译：用于确定扩展无通知指令的盈利能力的方法和装置

公开(公告)号：US07506331B2

公开(公告)日：2009-03-17

申请号：US10930042

申请日：2004-08-30

申请人： Roch Georges Archambault , Robert Frederick Enenkel , Robert William Hay , Allan Russell Martin , James Lawrence McInnes , Ronald Ian McIntosh , Mark Peter Mendell

发明人： Roch Georges Archambault , Robert Frederick Enenkel , Robert William Hay , Allan Russell Martin , James Lawrence McInnes , Ronald Ian McIntosh , Mark Peter Mendell

IPC分类号： G06F9/45

CPC分类号： G06F8/443

摘要： A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.

摘要翻译： 一种用于处理指令的方法，装置和计算机指令。构建数据依赖图。分析数据依赖关系图以进行复现，扩展位于复发之外的无关注指令。

5.

发明授权
Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer 失效
标题翻译：通过使用插入缓存到缓存数据传输的重置指令来管理带宽

公开(公告)号：US07168070B2

公开(公告)日：2007-01-23

申请号：US10853304

申请日：2004-05-25

申请人： Roch Georges Archambault , Robert James Blainey , Yaoging Gao , Randall Ray Heisch , Steven Wayne White

发明人： Roch Georges Archambault , Robert James Blainey , Yaoging Gao , Randall Ray Heisch , Steven Wayne White

IPC分类号： G06F9/45 , G06F13/00

CPC分类号： G06F12/0833 , G06F9/30047 , G06F9/3455 , G06F9/383

摘要： A method and system for reducing or avoiding store misses with a data cache block zero (DCBZ) instruction in cooperation with the underlying hardware load stream prefetching support for helping to increase effective aggregate bandwith. The method identifies and classifies unique streams in a loop based on dependency and reuse analysis, and performs loop transformations, such as node splitting, loop distribution or stream unrolling to get the proper number of streams. Static prediction and run-time profile information are used to guide loop and stream selection. Compile-time loop cost analysis and run-time check code and versioning are used to determine the number of cache lines ahead of each reference for data cache line zeroing and to tolerate required data alignment relative to data cache lines.

摘要翻译： 与底层硬件负载流预取支持协作，通过数据缓存块零（DCBZ）指令减少或避免存储错误的方法和系统，以帮助增加有效的聚合带宽。该方法基于依赖和重用分析在循环中识别和分类唯一流，并执行循环转换，例如节点分割，循环分布或流展开以获得适当数量的流。静态预测和运行时间轮廓信息用于指导循环和流选择。编译时循环成本分析和运行时检查代码和版本控制用于确定数据高速缓存行归零的每个引用之前的高速缓存行数，并允许相对于数据高速缓存行的所需数据对齐。

6.

发明授权
Optimal cache replacement scheme using a training operation 失效
标题翻译：使用训练操作的最优缓存替换方案

公开(公告)号：US08352684B2

公开(公告)日：2013-01-08

申请号：US12236188

申请日：2008-09-23

申请人： Roch Georges Archambault , Shimin Cui , Chen Ding , Yaoqing Gao , Xiaoming Gu , Raul Esteban Silvera , Chengliang Zhang

发明人： Roch Georges Archambault , Shimin Cui , Chen Ding , Yaoqing Gao , Xiaoming Gu , Raul Esteban Silvera , Chengliang Zhang

IPC分类号： G06F12/00

CPC分类号： G06F12/123 , G06F2212/502

摘要： Computer implemented method, system and computer usable program code for cache management. A cache is provided, wherein the cache is viewed as a sorted array of data elements, wherein a top position of the array is a most recently used position of the array and a bottom position of the array is a least recently used position of the array. A memory access sequence is provided, and a training operation is performed with respect to a memory access of the memory access sequence to determine a type of memory access operation to be performed with respect to the memory access. Responsive to a result of the training operation, a cache replacement operation is performed using the determined memory access operation with respect to the memory access.

摘要翻译： 计算机实现方法，系统和计算机可用程序代码进行缓存管理。提供了缓存，其中高速缓存被视为数据元素的排序数组，其中阵列的顶部位置是阵列的最近使用的位置，并且阵列的底部位置是阵列的最近最近使用的位置。提供存储器访问序列，并且针对存储器访问序列的存储器访问执行训练操作，以确定要针对存储器访问执行的存储器访问操作的类型。响应于训练操作的结果，使用关于存储器访问的确定的存储器访问操作来执行高速缓存替换操作。

7.

发明授权
Procedure control descriptor-based code specialization for context sensitive memory disambiguation 有权
标题翻译：过程控制描述符代码专用于上下文敏感内存消歧

公开(公告)号：US08332833B2

公开(公告)日：2012-12-11

申请号：US11757941

申请日：2007-06-04

申请人： Roch Georges Archambault , Shimin Cui , Yaoqing Gao , Raul Esteban Silvera , Peng Zhao

发明人： Roch Georges Archambault , Shimin Cui , Yaoqing Gao , Raul Esteban Silvera , Peng Zhao

IPC分类号： G06F9/45

CPC分类号： G06F8/4441

摘要： A computer implemented method for facilitating debugging of source code. The source code is scanned to identify a candidate region. A procedure control descriptor is generated, wherein the procedure control descriptor corresponds to the candidate region. The procedure control descriptor identifies, for the candidate region, a condition which, if true at runtime means that the candidate region can be specialized. Responsive to a determination during compile time that satisfaction of at least one condition will be known only at runtime, the procedure control descriptor is used to specialize the candidate region at compile time to create a first version of the candidate region for execution in a case where the condition is true and a second version of the candidate region for execution in a case where the condition is false, and further generate code to correctly select one of the first region and the second region at runtime.

摘要翻译： 一种用于促进源代码调试的计算机实现方法。扫描源代码以识别候选区域。生成过程控制描述符，其中过程控制描述符对应于候选区域。程序控制描述符为候选区域识别条件，其在运行时为真，意味着候选区域可以是专门的。在编译期间响应于在运行时仅满足至少一个条件的确定，过程控制描述符用于在编译时专门化候选区域，以在第一版本的候选区域中创建用于执行的候选区域，条件为真，并且在条件为假的情况下用于执行的候选区域的第二版本，并且还在生成期间生成正确选择第一区域和第二区域中的一个的代码。

8.

发明申请
Optimal Cache Management Scheme 失效
标题翻译：最优缓存管理方案

公开(公告)号：US20100077153A1

公开(公告)日：2010-03-25

申请号：US12236188

申请日：2008-09-23

申请人： Roch Georges Archambault , Shimin Cui , Chen Ding , Yaoqing Gao , Xiaoming Gu , Raul Esteban Silvera , Chengliang Zhang

发明人： Roch Georges Archambault , Shimin Cui , Chen Ding , Yaoqing Gao , Xiaoming Gu , Raul Esteban Silvera , Chengliang Zhang

IPC分类号： G06F12/08

CPC分类号： G06F12/123 , G06F2212/502

摘要： Computer implemented method, system and computer usable program code for cache management. A cache is provided, wherein the cache is viewed as a sorted array of data elements, wherein a top position of the array is a most recently used position of the array and a bottom position of the array is a least recently used position of the array. A memory access sequence is provided, and a training operation is performed with respect to a memory access of the memory access sequence to determine a type of memory access operation to be performed with respect to the memory access. Responsive to a result of the training operation, a cache replacement operation is performed using the determined memory access operation with respect to the memory access.

摘要翻译： 计算机实现方法，系统和计算机可用程序代码进行缓存管理。提供了缓存，其中高速缓存被视为数据元素的排序数组，其中阵列的顶部位置是阵列的最近使用的位置，并且阵列的底部位置是阵列的最近最近使用的位置。提供存储器访问序列，并且针对存储器访问序列的存储器访问执行训练操作，以确定要针对存储器访问执行的存储器访问操作的类型。响应于训练操作的结果，使用关于存储器访问的确定的存储器访问操作来执行高速缓存替换操作。

9.

发明授权
Method and apparatus for improving data cache performance using inter-procedural strength reduction of global objects 失效
标题翻译：使用全局对象的程序间强度降低来提高数据高速缓存性能的方法和装置

公开(公告)号：US07555748B2

公开(公告)日：2009-06-30

申请号：US10930037

申请日：2004-08-30

申请人： Roch Georges Archambault , Shimin Cui , Yaoqing Gao , Raul Esteban Silvera

发明人： Roch Georges Archambault , Shimin Cui , Yaoqing Gao , Raul Esteban Silvera

IPC分类号： G06F9/45

CPC分类号： G06F8/4442

摘要： Inter-procedural strength reduction is provided by a mechanism of the present invention to improve data cache performance. During a forward pass, the present invention collects information of global variables and analyzes the usage pattern of global objects to select candidate computations for optimization. During a backward pass, the present invention remaps global objects into smaller size new global objects and generates more cache efficient code by replacing candidate computations with indirect or indexed reference of smaller global objects and inserting store operations to the new global objects for each computation that references the candidate global objects.

摘要翻译： 通过本发明的机制来提供程序间强度降低以提高数据高速缓存性能。在正向通过期间，本发明收集全局变量的信息并分析全局对象的使用模式以选择用于优化的候选计算。在反向传递期间，本发明将全局对象重新映射成更小尺寸的新全局对象，并且通过使用较小全局对象的间接索引引用或索引引用来替换候选计算，并且将引用存储操作插入到新的全局对象中，以引用每个计算候选全球对象。

10.

发明授权
Optimizing source code for iterative execution 有权
标题翻译：优化源代码进行迭代执行

公开(公告)号：US07340733B2

公开(公告)日：2008-03-04

申请号：US10314094

申请日：2002-12-05

申请人： Roch Georges Archambault , Robert James Blainey , Charles Brian Hall , Yingwei Zhang

发明人： Roch Georges Archambault , Robert James Blainey , Charles Brian Hall , Yingwei Zhang

IPC分类号： G06F9/44 , G06F9/45

CPC分类号： G06F8/4441

摘要： An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to store a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.

摘要翻译： 本发明的一个实施例提供一种用于优化源代码以产生优化的源代码的优化器，其具有用于指示中央处理单元（CPU）迭代地计算初级重复元件的值的指令。用于计算主要递归元素和随后的递归元素的计算机编程回路是涉及迭代地计算主要复发元素的情况的示例。 CPU可操作地耦合到快速操作存储器（FOM）并且可操作地耦合到慢速操作存储器（SOM）。 SOM存储生成的优化源代码。优化的源代码包括用于指示所述CPU将计算的主循环元素的值存储在FOM的存储位置中的指令。指令还包括将计算的主循环元素的值从存储位置委托给FOM的另一个存储位置的指令。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类