System and method for scheduling memory instructions to provide adequate prefetch latency
    1.
    发明授权
    System and method for scheduling memory instructions to provide adequate prefetch latency 有权
    用于调度存储器指令以提供足够的预取延迟的系统和方法

    公开(公告)号:US06678796B1

    公开(公告)日:2004-01-13

    申请号:US09679431

    申请日:2000-10-03

    IPC分类号: G06F1200

    摘要: A method and apparatus for scheduling instructions to provide adequate prefetch latency is disclosed during compilation of a program code in to a program. The prefetch scheduler component of the present invention selects a memory operation within the program code as a “martyr load” and removes the prefetch associated with the martyr load, if any. The prefetch scheduler takes advantage of the latency associated with the martyr load to schedule prefetches for memory operations which follow the martyr load. The prefetches are scheduled “behind” (i.e., prior to) the martyr load to allow the prefetches to complete before the associated memory operations are carried out. The prefetch schedule component continues this process throughout the program code to optimize prefetch scheduling and overall program operation.

    摘要翻译: 在编程程序代码期间公开了一种用于调度指令以提供足够的预取延迟的方法和装置。 本发明的预取调度器部件选择程序代码内的存储器操作为“烈度加载”,并且删除与烈士负载相关联的预取(如果有的话)。 预取调度器利用与烈士负载相关联的延迟来调度针对烈士负载之后的存储器操作的预取。 预取被安排在“后面”(即在之前)烈士负载,以便在执行相关联的存储器操作之前完成预取。 预取计划组件在整个程序代码中继续此过程,以优化预取调度和整体程序操作。

    Method and apparatus for performing prefetching at the critical section level
    2.
    发明授权
    Method and apparatus for performing prefetching at the critical section level 有权
    在临界区段执行预取的方法和装置

    公开(公告)号:US06427235B1

    公开(公告)日:2002-07-30

    申请号:US09434714

    申请日:1999-11-05

    IPC分类号: G06F944

    摘要: One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within critical sections of code that are subject to mutual exclusion. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system identifies a critical section within the executable code module by identifying a region of code between a mutual exclusion lock operation and a mutual exclusion unlock operation. The system schedules explicit prefetch instructions into the critical section in advance of associated memory operations. In one embodiment, the system identifies the critical section of code by using a first macro to perform the mutual exclusion lock operation, wherein the first macro additionally activates prefetching. The system also uses a second macro to perform the mutual exclusion unlock operation, wherein the second macro additionally deactivates prefetching.

    摘要翻译: 本发明的一个实施例提供了一种用于将源代码编译成可执行代码的系统,该系统执行代码相互排斥的关键代码段内的存储器操作的预取。 该系统通过将包含编程语言指令的源代码模块编译成包含适合于处理器执行的指令的可执行代码模块来操作。 接下来,系统通过识别互斥锁定操作和互斥解锁操作之间的代码区域来识别可执行代码模块内的关键部分。 在关联的存储器操作之前,系统将显式预取指令调度到临界区。 在一个实施例中,系统通过使用第一宏来执行互斥锁定操作来识别代码的关键部分,其中第一宏另外激活预取。 系统还使用第二宏来执行互斥解锁操作,其中第二宏附加地去激活预取。

    System and method for scheduling instructions to maximize outstanding prefetches and loads
    3.
    发明授权
    System and method for scheduling instructions to maximize outstanding prefetches and loads 有权
    用于调度指令以最大程度提高预取和负载的系统和方法

    公开(公告)号:US06918111B1

    公开(公告)日:2005-07-12

    申请号:US09679434

    申请日:2000-10-03

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: The present invention discloses a method and device for ordering memory operation instructions in an optimizing compiler. for a processor that can potentially enter a stall state if a memory queue is full. The method uses a dependency graph coupled with one or more memory queues. The dependency graph is used to show the dependency relationships between instructions in a program being compiled. After creating the dependency graph, the ready nodes are identified. Dependency graph nodes that correspond to memory operations may have the effect of adding an element to the memory queue or removing one or more elements from the memory queue. The ideal situation is to keep the memory queue as full as possible without exceeding the maximum desirable number of elements, by scheduling memory operations to maximize the parallelism of memory operations while avoiding stalls on the target processor.

    摘要翻译: 本发明公开了一种在优化编译器中排序存储器操作指令的方法和装置。 对于处理器,如果内存队列已满,则可能会进入失速状态。 该方法使用与一个或多个存储器队列耦合的依赖图。 依赖图用于显示正在编译的程序中的指令之间的依赖关系。 创建依赖关系图后,就可以识别就绪节点。 对应于存储器操作的依赖性图节点可能具有将元素添加到存储器队列或从存储器队列中移除一个或多个元件的效果。 理想的情况是通过调度存储器操作来最大化存储器操作的并行性,同时避免目标处理器上的停顿,使存储器队列保持尽可能不满,而不超过最大期望数量的元件。

    System and method for insertion of prefetch instructions by a compiler
    4.
    发明授权
    System and method for insertion of prefetch instructions by a compiler 有权
    由编译器插入预取指令的系统和方法

    公开(公告)号:US06651245B1

    公开(公告)日:2003-11-18

    申请号:US09679433

    申请日:2000-10-03

    IPC分类号: G06F945

    摘要: The present invention discloses a method and device for placing prefetch instruction in a low-level or assembly code instruction stream. It involves the use of a new concept called a martyr memory operation. When inserting prefetch instructions in a code stream, some instructions will still miss the cache because in some circumstances a prefetch cannot be added at all, or cannot be added early enough to allow the needed reference to be in cache before being referenced by an executing instruction. A subset of these instructions are identified using a new method and designated as martyr memory operations. Once identified, other memory operations that would also have been cache misses can “hide” behind the martyr memory operation and complete their prefetches while the processor, of necessity, waits for the martyr memory operation instruction to complete. This will increase the number of cache hits.

    摘要翻译: 本发明公开了一种用于将预取指令放置在低级或汇编代码指令流中的方法和装置。 它涉及使用称为烈士记忆操作的新概念。 当在代码流中插入预取指令时,一些指令仍将错过高速缓存,因为在某些情况下,根本无法添加预取,或者不能早期添加,以便在执行指令引用之前将所需的引用置于高速缓存中 。 这些指令的一个子集使用新的方法进行识别,并被指定为烈士记忆操作。 一旦识别出,也可能是高速缓存未命中的其他内存操作可以“隐藏”在烈士内存操作之后并完成其预取,而处理器必须等待烈士内存操作指令完成。 这将增加缓存命中数。

    Heuristic for identifying loads guaranteed to hit in processor cache
    5.
    发明授权
    Heuristic for identifying loads guaranteed to hit in processor cache 有权
    启发式,用于识别保证在处理器缓存中命中的负载

    公开(公告)号:US06574713B1

    公开(公告)日:2003-06-03

    申请号:US09685431

    申请日:2000-10-10

    IPC分类号: G06F1200

    摘要: A heuristic algorithm which identifies loads guaranteed to hit the processor cache which further provides a “minimal” set of prefetches which are scheduled/inserted during compilation of a program is disclosed. The heuristic algorithm of the present invention utilizes the concept of a “cache line” (i.e., the data chunks received during memory operations) in conjunction with the concept of “related” memory operations for determining which prefetches are unnecessary for related memory operations; thus, generating a minimal number of prefetches for related memory operations.

    摘要翻译: 公开了一种启发式算法,其识别确保撞击处理器高速缓存的负载,其进一步提供在编程期间被调度/插入的“最小”预取集合。 本发明的启发式算法结合“相关”存储器操作的概念,利用“高速缓存行”(即,存储器操作期间接收的数据块)的概念,用于确定哪些预取对于相关存储器操作是不必要的; 因此,为相关的存储器操作生成最少数量的预取。

    Method and apparatus for performing prefetching at the function level
    6.
    发明授权
    Method and apparatus for performing prefetching at the function level 有权
    用于在功能级别执行预取的方法和装置

    公开(公告)号:US06421826B1

    公开(公告)日:2002-07-16

    申请号:US09434715

    申请日:1999-11-05

    IPC分类号: G06F944

    CPC分类号: G06F9/383

    摘要: One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within regions of code that tend to generate cache misses. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system runs the executable code module in a training mode on a representative workload and keeps statistics on cache miss rates for functions within the executable code module. These statistics are used to identify a set of “hot” functions that generate a large number of cache misses. Next, explicit prefetch instructions are scheduled in advance of memory operations within the set of hot functions. In one embodiment, explicit prefetch operations are scheduled into the executable code module by activating prefetch generation at a start of an identified function, and by deactivating prefetch generation at a return from the identified function. In embodiment, the system further schedules prefetch operations for the memory operations by identifying a subset of memory operations of a particular type within the set of hot functions, and scheduling explicit prefetch operations for memory operations belonging to the subset.

    摘要翻译: 本发明的一个实施例提供了一种用于将源代码编译成可执行代码的系统,其对易于产生高速缓存未命中的代码区域内的存储器操作进行预取。 该系统通过将包含编程语言指令的源代码模块编译成包含适合于处理器执行的指令的可执行代码模块来操作。 接下来,系统以代表性工作量的训练模式运行可执行代码模块,并且保持对可执行代码模块内的功能的高速缓存未命中率的统计。 这些统计信息用于识别一组产生大量高速缓存未命中的“热”功能。 接下来,在热功能集合内的存储器操作之前安排显式预取指令。 在一个实施例中,通过在识别的功能的开始处激活预取生成,并且通过在从所识别的功能返回时停用预取生成,将显式预取操作调度到可执行代码模块中。 在实施例中,系统通过识别热功能集合内的特定类型的存储器操作的子集,并且对属于该子集的存​​储器操作调度显式预取操作来进一步调度存储器操作的预取操作。

    Disambiguating memory references based upon user-specified programming constraints
    7.
    发明授权
    Disambiguating memory references based upon user-specified programming constraints 有权
    根据用户指定的编程约束消除内存引用

    公开(公告)号:US06718542B1

    公开(公告)日:2004-04-06

    申请号:US09549806

    申请日:2000-04-14

    IPC分类号: G06F945

    CPC分类号: G06F8/434 G06F8/445

    摘要: A system that allows a programmer to specify a set of constraints that the programmer has adhered to in writing code so that a compiler is able to assume the set of constraints in disambiguating memory references within the code. The system operates by receiving an identifier for a set of constraints on memory references that the programmer has adhered to in writing the code. The system uses the identifier to select a disambiguation technique from a set of disambiguation techniques. Note that each disambiguation technique is associated with a different set of constraints on memory references. The system uses the selected disambiguation technique to identify memory references within the code that can alias with each other.

    摘要翻译: 一种允许程序员指定程序员在编写代码时遵守的一组约束的系统,以便编译器能够在消除代码内存储器引用的歧义的情况下承担一组约束。 系统通过接收编程人员在编写代码时所遵守的存储器引用上的一组约束的标识符来操作。 系统使用标识符从一组消歧技术中选择消歧技术。 请注意,每个消歧技术都与存储器引用的一组不同的约束相关联。 系统使用所选择的消歧技术来识别代码中可以彼此别名的内存引用。

    Aggressive prefetch of address chains
    8.
    发明授权
    Aggressive prefetch of address chains 有权
    积极预取地址链

    公开(公告)号:US07137111B2

    公开(公告)日:2006-11-14

    申请号:US09996088

    申请日:2001-11-28

    IPC分类号: G06F9/44 G06F9/30

    CPC分类号: G06F9/30047 G06F9/3842

    摘要: Operations including inserted prefetch operations that correspond to addressing chains may be scheduled above memory access operations that are likely-to-miss, thereby exploiting latency of the “martyred” likely-to-miss operations and improving execution performance of resulting code. More generally, certain pre-executable counterparts of likely-to-stall operations that form dependency chains may be scheduled above operations that are themselves likely-to-stall.

    摘要翻译: 包括对应于寻址链的插入预取操作的操作可以被安排在可能丢失的存储器访问操作之上,从而利用“殉职”可能对错误操作的延迟并且提高结果代码的执行性能。 更一般地,形成依赖关系链的可能到失速操作的某些预执行对应物可能被调度在本身可能失效的操作之上。

    Method, apparatus and computer program product for processing stack
related exception traps
    10.
    发明授权
    Method, apparatus and computer program product for processing stack related exception traps 失效
    用于处理堆栈相关异常陷阱的方法,设备和计算机程序产品

    公开(公告)号:US6167504A

    公开(公告)日:2000-12-26

    申请号:US122172

    申请日:1998-07-24

    申请人: Peter C. Damron

    发明人: Peter C. Damron

    摘要: Apparatus, methods, and computer program products are disclosed that improve the operation of a computer that uses a top-of-stack cache by reducing the number of overflow and underflow traps generated during the execution of a program. The invention maintains a predictor value that controls the number of stack elements that are spilled from, or filled to, the top-of-stack cache in response to an overflow trap or an underflow trap (respectively). The predictor reflects the history of overflow traps and underflow traps.

    摘要翻译: 公开了装置,方法和计算机程序产品,其通过减少在执行程序期间产生的溢出和下溢陷阱的数量来改进使用堆叠高速缓存的计算机的操作。 本发明保持预测值,该预测值响应于溢出陷阱或下溢陷阱(分别)控制从顶部堆栈高速缓存溢出或填充到堆叠高速缓存的堆栈元素的数量。 预测器反映了溢流陷阱和下溢陷阱的历史。