Dependency analysis system and method
    71.
    发明授权
    Dependency analysis system and method 有权
    依赖性分析系统和方法

    公开(公告)号:US07581215B1

    公开(公告)日:2009-08-25

    申请号:US10876228

    申请日:2004-06-24

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452 G06F8/433

    摘要: We present a technique to perform dependence analysis on more complex array subscripts than the linear form of the enclosing loop indices. For such complex array subscripts, we decouple the original iteration space and the dependence test iteration space and link them through index-association functions. The dependence analysis is performed in the dependence test iteration space to determine whether the dependence exists in the original iteration space. The dependence distance in the original iteration space is determined by the distance in the dependence test iteration space and the property of index-association functions. For certain non-linear expressions, we show how to transform it to a set of linear expressions equivalently. The latter can be used in dependence test with traditional techniques. We also show how our advanced dependence analysis technique can help parallelize some otherwise hard-to-parallelize loops.

    摘要翻译: 我们提出了一种技术来对更复杂的数组下标进行依赖分析,而不是围绕循环索引的线性形式。 对于这种复杂的数组下标,我们将原始迭代空间和依赖测试迭代空间分离,并通过索引关联函数进行链接。 依赖性分析在依赖测试迭代空间中进行,以确定依赖关系是否存在于原始迭代空间中。 原始迭代空间中的依赖距离由依赖测试迭代空间中的距离和索引关联函数的属性决定。 对于某些非线性表达式,我们展示了如何将它转换为一组等价的线性表达式。 后者可以用于与传统技术的依赖测试。 我们还展示了我们的高级依赖分析技术如何帮助并行化其他难以并行化的环路。

    Method and system for identifying multi-block indirect memory access chains
    72.
    发明授权
    Method and system for identifying multi-block indirect memory access chains 有权
    用于识别多块间接存储器访问链的方法和系统

    公开(公告)号:US07383401B2

    公开(公告)日:2008-06-03

    申请号:US11446624

    申请日:2006-06-05

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0862 G06F2212/6028

    摘要: A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.

    摘要翻译: 一种用于识别多块间接存储器访问链的方法和系统。 方法可以包括识别过程的入口点和出口点之间的基本块,其中过程包括管理其执行的控制语句。 可以确定给定基本块相对于控制语句的执行概率是否等于或超过第一阈值。 如果是这样,则可以生成一组或多个间接存储器访问链,其中每个链包括至少一个相应的头部存储器访问,其不依赖于其在给定基本块内的另一个存储器访问上的存储器地址计算。 可以根据块的相对执行概率是否超过阈值,跨基本块连接链。

    Method and system for generating prefetch information for multi-block indirect memory access chains
    73.
    发明申请
    Method and system for generating prefetch information for multi-block indirect memory access chains 有权
    用于生成多块间接存储器访问链的预取信息的方法和系统

    公开(公告)号:US20070283106A1

    公开(公告)日:2007-12-06

    申请号:US11446643

    申请日:2006-06-05

    IPC分类号: G06F13/00

    CPC分类号: G06F8/4442

    摘要: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.

    摘要翻译: 为多块间接存储器访问链生成预取信息。 一种方法可以包括选择一个过程的间接存储器访问链,该链包括在该过程内不依赖于其地址对另一预取候选存储器访问的头访问以及取决于其在头访问上的地址的间接访问 。 该方法还可以包括确定链的预取预取值,以及生成与头访问相对应的加载操作,其指定依赖于预提取值和头访问的地址的目标存储器地址。 该方法还可以包括:对于链的终端间接访问,以与其对应的终端间接访问相同的方式生成依赖于其对先前加载操作的结果的地址计算的相应预取操作取决于链中的先前访问 。

    Facilitating communication and synchronization between main and scout threads
    74.
    发明申请
    Facilitating communication and synchronization between main and scout threads 有权
    促进主和侦察线程之间的通信和同步

    公开(公告)号:US20070022422A1

    公开(公告)日:2007-01-25

    申请号:US11272178

    申请日:2005-11-09

    IPC分类号: G06F9/46

    摘要: One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.

    摘要翻译: 本发明的一个实施例提供一种用于在主线程和辅助线程之间进行通信和执行同步操作的系统。 系统通过在主线程中执行程序来启动。 在遇到具有相关联的助手线程代码的循环时,系统通过辅助线程分别开始与主线程并行执行代码。 在由辅助线程执行代码的同时,如果由辅助线程执行的代码不再执行有用的工作,则系统将定期检查主线程的进度并停用辅助线程。 因此,辅助线程在主线程正在执行的地方执行以预取主线程的数据项,而不必耗费处理器资源或妨碍主线程的执行。

    Method and apparatus for software scouting regions of a program
    75.
    发明申请
    Method and apparatus for software scouting regions of a program 有权
    程序的软件侦察区域的方法和装置

    公开(公告)号:US20070022412A1

    公开(公告)日:2007-01-25

    申请号:US11272210

    申请日:2005-11-09

    IPC分类号: G06F9/45

    摘要: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.

    摘要翻译: 本发明的一个实施例提供一种系统,其生成针对程序区域进行软件侦察的代码。 在运行期间,系统接收程序的源代码。 系统然后编译源代码。 在编译过程的第一步中,系统从源代码中的循环层级识别第一组循环,其中第一组循环中的每个循环包含至少一个有效预取候选。 然后,从第一组循环中,系统识别侦察模式预取有利可图的第二组循环。 接下来,对于第二组循环中的每个循环,系统为辅助线程生成可执行代码,其中包含每个有效预取候选的预取指令。 在运行时,辅助线程与主线程并行执行,主线程正在执行以预取主线程的数据项。

    Parallelization scheme for generic reduction
    76.
    发明授权
    Parallelization scheme for generic reduction 有权
    通用缩减的并行化方案

    公开(公告)号:US07620945B1

    公开(公告)日:2009-11-17

    申请号:US11205822

    申请日:2005-08-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/45

    摘要: One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel. By supporting the parallel execution of such generic reduction operations in this way, the present invention extends parallel execution for reduction operations beyond basic commutative and associative operations such as addition and multiplication.

    摘要翻译: 本发明的一个实施例提供一种以并行编程语言支持并行化的通用简化操作的系统,其中缩减操作是可以被划分成可以并行执行的一组子操作的关联操作。 在操作期间,系统检测源代码中的通用缩减操作。 在这样做时,系统识别通用缩减操作将在其上运行的一组减少变量,以及变量的一组初始值。 该系统另外标识合并操作,其将来自并行通用缩减操作的部分结果合并到最终结果中。 然后,该系统将该程序的源代码编译为便于并行执行泛型还原操作的形式。 通过以这种方式支持这种通用缩减操作的并行执行,本发明将缩减操作的并行执行扩展到基本的交替和关联操作(例如加法和乘法)之外。

    Method and apparatus for optimizing computer program performance using steered execution
    77.
    发明授权
    Method and apparatus for optimizing computer program performance using steered execution 有权
    使用转向执行优化计算机程序性能的方法和装置

    公开(公告)号:US07458067B1

    公开(公告)日:2008-11-25

    申请号:US11084656

    申请日:2005-03-18

    IPC分类号: G06F9/44

    CPC分类号: G06F8/443

    摘要: One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.

    摘要翻译: 本发明的一个实施例提供了一种通过使用转向执行来有助于优化计算机程序性能的系统。 该系统首先接收计算机程序的源代码,然后用第一组优化来编译该源代码的一部分以生成第一编译部分。 该系统还使用第二组优化来编译源代码的相同部分以生成第二编译部分。 编译剩余源代码以生成第三编译部分。 另外,生成用于在第一编译部分和第二编译部分之间进行选择的规则。 最后,将第一编译部分,第二编译部分,第三编译部分和规则组合成可执行输出文件。

    Anticipatory helper thread based code execution
    78.
    发明申请
    Anticipatory helper thread based code execution 有权
    基于预期的助手线程代码执行

    公开(公告)号:US20070271565A1

    公开(公告)日:2007-11-22

    申请号:US11436948

    申请日:2006-05-18

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843 G06F9/52

    摘要: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.

    摘要翻译: 一种在计算系统中使用线程的方法和机制。 多线程计算系统被配置为执行第一线程和第二线程。 响应于检测功能的发起点的第一线程,第一线程被配置为向第二线程提供指示第二线程可以开始执行给定功能的指示。 该功能的启动点在执行顺序中的函数的实际调用点之前。 第二线程被配置为响应于该指示来启动该功能的执行。 该功能包括一个或多个输入,第二线程使用一个或多个输入中的每一个的预期值。 当第一线程到达功能的调用点时,第一线程被配置为使用第二线程的执行结果,以响应于确定第二线程使用的预期值是正确的。

    Specifically crosslinked hemoglobin with free functionality
    79.
    发明授权
    Specifically crosslinked hemoglobin with free functionality 失效
    具有自由功能的具有交联性的血红蛋白

    公开(公告)号:US5399671A

    公开(公告)日:1995-03-21

    申请号:US978418

    申请日:1992-11-18

    摘要: Hemoglobin is site-specifically crosslinked into its tetrameric form by reaction with a trifunctional reagent which combines electrostatic effects, steric effects and the presence of functional groups so that two of the functional groups react with specific sites on the hemoglobin whilst the third site is left free for reaction with endogenous nucleophilic compounds. A specific example of such a crosslinking reagent is trimesoyl tris(3,5-dibromosalicylate), TTDS, which effects specific crosslinking between the amino groups of lysine-82 on each respective .beta. sub-unit. While the crosslinking reagent TTDS has three available carboxyl groups for the crosslinking reaction, only two so react, leaving one free carboxyl for reaction with exogenous nucleophiles, e.g. to render the hemoglobin product useful as a carrier for nucleophilic compounds through the body's circulatory system.

    摘要翻译: 血红蛋白通过与三官能试剂反应进行位置特异性交联成四聚体形式,其结合了静电效应,空间效应和功能基团的存在,使得两个官能团与血红蛋白上的特定位点反应,而第三位点保持游离 用于与内源性亲核化合物反应。 这种交联试剂的具体实例是均苯三甲酸三(3,5-二溴水杨酸酯)TTDS,其在每个相应的β子单元上影响赖氨酸-82的氨基之间的特异性交联。 虽然交联剂TTDS具有三个用于交联反应的可用羧基,但仅有两个反应,留下一个游离羧基用于与外源亲核试剂反应,例如, 以使血红蛋白产物通过身体的循环系统用作亲核化合物的载体。

    Minimizing register spills by using register moves
    80.
    发明授权
    Minimizing register spills by using register moves 有权
    通过使用寄存器移动来最小化寄存器溢出

    公开(公告)号:US09009692B2

    公开(公告)日:2015-04-14

    申请号:US12647484

    申请日:2009-12-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/441

    摘要: A system and method for minimizing register spills during compilation. A compiler reallocates spilled variables from stack memory to other available registers. Although a corresponding register file may not have available registers for storage, the compiler identifies available registers in other locations for storage. The compiler identifies available registers in an alternate register file, wherein the alternate register file may be a floating-point register file which is then used for spilled integer variables. Other instruction type combinations between spilled variables and alternate register files are possible. When an available register within the alternate register file is identified, the compiler modifies the program instructions to allocate the corresponding spilled variable to the available register.

    摘要翻译: 一种用于在编译期间最小化寄存器溢出的系统和方法。 编译器将溢出的变量从堆栈内存重新分配给其他可用的寄存器。 虽然相应的寄存器文件可能没有可用的存储寄存器,但编译器可以识别其他位置的可用寄存器进行存储。 编译器识别备用寄存器文件中的可用寄存器,其中备用寄存器文件可以是浮点寄存器文件,然后用于溢出的整数变量。 溢出变量和备用寄存器文件之间的其他指令类型组合是可能的。 当识别备用寄存器文件中的可用寄存器时,编译器会修改程序指令,将相应的溢出变量分配给可用的寄存器。