Method and system for identifying multi-block indirect memory access chains
    51.
    发明授权
    Method and system for identifying multi-block indirect memory access chains 有权
    用于识别多块间接存储器访问链的方法和系统

    公开(公告)号:US07383401B2

    公开(公告)日:2008-06-03

    申请号:US11446624

    申请日:2006-06-05

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0862 G06F2212/6028

    摘要: A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.

    摘要翻译: 一种用于识别多块间接存储器访问链的方法和系统。 方法可以包括识别过程的入口点和出口点之间的基本块,其中过程包括管理其执行的控制语句。 可以确定给定基本块相对于控制语句的执行概率是否等于或超过第一阈值。 如果是这样,则可以生成一组或多个间接存储器访问链,其中每个链包括至少一个相应的头部存储器访问,其不依赖于其在给定基本块内的另一个存储器访问上的存储器地址计算。 可以根据块的相对执行概率是否超过阈值,跨基本块连接链。

    Method and system for generating prefetch information for multi-block indirect memory access chains
    52.
    发明申请
    Method and system for generating prefetch information for multi-block indirect memory access chains 有权
    用于生成多块间接存储器访问链的预取信息的方法和系统

    公开(公告)号:US20070283106A1

    公开(公告)日:2007-12-06

    申请号:US11446643

    申请日:2006-06-05

    IPC分类号: G06F13/00

    CPC分类号: G06F8/4442

    摘要: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.

    摘要翻译: 为多块间接存储器访问链生成预取信息。 一种方法可以包括选择一个过程的间接存储器访问链,该链包括在该过程内不依赖于其地址对另一预取候选存储器访问的头访问以及取决于其在头访问上的地址的间接访问 。 该方法还可以包括确定链的预取预取值,以及生成与头访问相对应的加载操作,其指定依赖于预提取值和头访问的地址的目标存储器地址。 该方法还可以包括:对于链的终端间接访问,以与其对应的终端间接访问相同的方式生成依赖于其对先前加载操作的结果的地址计算的相应预取操作取决于链中的先前访问 。

    Facilitating communication and synchronization between main and scout threads
    53.
    发明申请
    Facilitating communication and synchronization between main and scout threads 有权
    促进主和侦察线程之间的通信和同步

    公开(公告)号:US20070022422A1

    公开(公告)日:2007-01-25

    申请号:US11272178

    申请日:2005-11-09

    IPC分类号: G06F9/46

    摘要: One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.

    摘要翻译: 本发明的一个实施例提供一种用于在主线程和辅助线程之间进行通信和执行同步操作的系统。 系统通过在主线程中执行程序来启动。 在遇到具有相关联的助手线程代码的循环时,系统通过辅助线程分别开始与主线程并行执行代码。 在由辅助线程执行代码的同时,如果由辅助线程执行的代码不再执行有用的工作,则系统将定期检查主线程的进度并停用辅助线程。 因此,辅助线程在主线程正在执行的地方执行以预取主线程的数据项,而不必耗费处理器资源或妨碍主线程的执行。

    Method and apparatus for software scouting regions of a program
    54.
    发明申请
    Method and apparatus for software scouting regions of a program 有权
    程序的软件侦察区域的方法和装置

    公开(公告)号:US20070022412A1

    公开(公告)日:2007-01-25

    申请号:US11272210

    申请日:2005-11-09

    IPC分类号: G06F9/45

    摘要: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.

    摘要翻译: 本发明的一个实施例提供一种系统,其生成针对程序区域进行软件侦察的代码。 在运行期间,系统接收程序的源代码。 系统然后编译源代码。 在编译过程的第一步中,系统从源代码中的循环层级识别第一组循环,其中第一组循环中的每个循环包含至少一个有效预取候选。 然后,从第一组循环中,系统识别侦察模式预取有利可图的第二组循环。 接下来,对于第二组循环中的每个循环,系统为辅助线程生成可执行代码,其中包含每个有效预取候选的预取指令。 在运行时,辅助线程与主线程并行执行,主线程正在执行以预取主线程的数据项。

    Parallelization scheme for generic reduction
    58.
    发明授权
    Parallelization scheme for generic reduction 有权
    通用缩减的并行化方案

    公开(公告)号:US07620945B1

    公开(公告)日:2009-11-17

    申请号:US11205822

    申请日:2005-08-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/45

    摘要: One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel. By supporting the parallel execution of such generic reduction operations in this way, the present invention extends parallel execution for reduction operations beyond basic commutative and associative operations such as addition and multiplication.

    摘要翻译: 本发明的一个实施例提供一种以并行编程语言支持并行化的通用简化操作的系统,其中缩减操作是可以被划分成可以并行执行的一组子操作的关联操作。 在操作期间,系统检测源代码中的通用缩减操作。 在这样做时,系统识别通用缩减操作将在其上运行的一组减少变量,以及变量的一组初始值。 该系统另外标识合并操作,其将来自并行通用缩减操作的部分结果合并到最终结果中。 然后,该系统将该程序的源代码编译为便于并行执行泛型还原操作的形式。 通过以这种方式支持这种通用缩减操作的并行执行,本发明将缩减操作的并行执行扩展到基本的交替和关联操作(例如加法和乘法)之外。

    Method and apparatus for optimizing computer program performance using steered execution
    59.
    发明授权
    Method and apparatus for optimizing computer program performance using steered execution 有权
    使用转向执行优化计算机程序性能的方法和装置

    公开(公告)号:US07458067B1

    公开(公告)日:2008-11-25

    申请号:US11084656

    申请日:2005-03-18

    IPC分类号: G06F9/44

    CPC分类号: G06F8/443

    摘要: One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.

    摘要翻译: 本发明的一个实施例提供了一种通过使用转向执行来有助于优化计算机程序性能的系统。 该系统首先接收计算机程序的源代码,然后用第一组优化来编译该源代码的一部分以生成第一编译部分。 该系统还使用第二组优化来编译源代码的相同部分以生成第二编译部分。 编译剩余源代码以生成第三编译部分。 另外,生成用于在第一编译部分和第二编译部分之间进行选择的规则。 最后,将第一编译部分,第二编译部分,第三编译部分和规则组合成可执行输出文件。

    Anticipatory helper thread based code execution
    60.
    发明申请
    Anticipatory helper thread based code execution 有权
    基于预期的助手线程代码执行

    公开(公告)号:US20070271565A1

    公开(公告)日:2007-11-22

    申请号:US11436948

    申请日:2006-05-18

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843 G06F9/52

    摘要: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.

    摘要翻译: 一种在计算系统中使用线程的方法和机制。 多线程计算系统被配置为执行第一线程和第二线程。 响应于检测功能的发起点的第一线程,第一线程被配置为向第二线程提供指示第二线程可以开始执行给定功能的指示。 该功能的启动点在执行顺序中的函数的实际调用点之前。 第二线程被配置为响应于该指示来启动该功能的执行。 该功能包括一个或多个输入,第二线程使用一个或多个输入中的每一个的预期值。 当第一线程到达功能的调用点时,第一线程被配置为使用第二线程的执行结果,以响应于确定第二线程使用的预期值是正确的。