专利检索 ap:("Arun Kejariwal" OR "Xinmin Tian" OR "Wei Li" OR "Milind B. Girkar") AND inv:"Wei Li" 第 1 页

1.

发明申请
METHOD AND APPARATUS FOR EXPLOITING THREAD-LEVEL PARALLELISM 有权
标题翻译：用于开发螺纹水平并联的方法和装置

公开(公告)号：US20080244549A1

公开(公告)日：2008-10-02

申请号：US11695012

申请日：2007-03-31

申请人： Arun Kejariwal , Xinmin Tian , Wei Li , Milind B. Girkar

发明人： Arun Kejariwal , Xinmin Tian , Wei Li , Milind B. Girkar

IPC分类号： G06F9/45

CPC分类号： G06F8/456

摘要： According to one example embodiment, there is disclosed herein uses partial recurrence relaxation for parallelizing DOACROSS loops on multi-core computer architectures. By one example definition, a DOACROSS may be a loop that allows successive iterations executing by overlapping; that is, all iterations must impose a partial execution order. According to one embodiment, the inventive subject matter may be used to transform the dependence structure of a given loop with recurrences for maximal degree of thread-level parallelism (TLP), where the threads can be mapped on to either different logical processors (in a hyperthreaded processor) or can be mapped onto different physical cores (or processors) in a multi-core processor.

摘要翻译： 根据一个示例性实施例，这里公开了在多核计算机体系结构上使用部分递归松弛来并行化DOACROSS循环。通过一个示例定义，DOACROSS可以是允许通过重叠执行连续迭代的循环; 也就是说，所有迭代必须强制执行部分执行顺序。根据一个实施例，本发明的主题可以用于以线程级并行度（TLP）的最大程度的递归来转换给定循环的依赖结构，其中线程可以被映射到不同的逻辑处理器（在超线程处理器）或可以映射到多核处理器中的不同物理核心（或处理器）上。

2.

发明授权
Method and apparatus for exploiting thread-level parallelism 有权
标题翻译：利用线程级并行性的方法和装置

公开(公告)号：US07984431B2

公开(公告)日：2011-07-19

申请号：US11695012

申请日：2007-03-31

申请人： Arun Kejariwal , Xinmin Tian , Wei Li , Milind B. Girkar

发明人： Arun Kejariwal , Xinmin Tian , Wei Li , Milind B. Girkar

IPC分类号： G06F9/45

CPC分类号： G06F8/456

摘要： According to one example embodiment, there is disclosed herein uses partial recurrence relaxation for parallelizing DOACROSS loops on multi-core computer architectures. By one example definition, a DOACROSS may be a loop that allows successive iterations executing by overlapping; that is, all iterations must impose a partial execution order. According to one embodiment, the inventive subject matter may be used to transform the dependence structure of a given loop with recurrences for maximal degree of thread-level parallelism (TLP), where the threads can be mapped on to either different logical processors (in a hyperthreaded processor) or can be mapped onto different physical cores (or processors) in a multi-core processor.

摘要翻译： 根据一个示例性实施例，这里公开了在多核计算机体系结构上使用部分递归松弛来并行化DOACROSS循环。通过一个示例定义，DOACROSS可以是允许通过重叠执行连续迭代的循环; 也就是说，所有迭代必须强制执行部分执行顺序。根据一个实施例，本发明的主题可以用于以线程级并行度（TLP）的最大程度的递归来转换给定循环的依赖结构，其中线程可以被映射到不同的逻辑处理器（在超线程处理器）或可以映射到多核处理器中的不同物理核心（或处理器）上。

3.

发明授权
Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors 失效
标题翻译：快速无锁后等待同步，以利于多核处理器上的并行性

公开(公告)号：US07571301B2

公开(公告)日：2009-08-04

申请号：US11395841

申请日：2006-03-31

申请人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

发明人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

IPC分类号： G06F9/45 , G06F9/52

CPC分类号： G06F9/3009 , G06F8/458 , G06F9/30087 , G06F9/3836 , G06F9/3838 , G06F9/3851 , G06F9/3855 , G06F9/3857 , G06F9/3891

摘要： A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

摘要翻译： 一种改进计算机程序并行处理的方法。 DOACROSS循环和类似代码使用后等待控制结构进行标识和并行化。后等待控制结构可以被实现为包括执行执行顺序的单个计数器中的任何一个，用于跟踪由正整数的模数索引的代码完成的数组，和/或一组数组跟踪由线程完成的最后一个代码以及由线程执行的当前代码。

4.

发明申请
Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors 失效
标题翻译：快速无锁后等待同步，以利于多核处理器上的并行性

公开(公告)号：US20070234326A1

公开(公告)日：2007-10-04

申请号：US11395841

申请日：2006-03-31

申请人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

发明人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

IPC分类号： G06F9/45

CPC分类号： G06F9/3009 , G06F8/458 , G06F9/30087 , G06F9/3836 , G06F9/3838 , G06F9/3851 , G06F9/3855 , G06F9/3857 , G06F9/3891

摘要： A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

摘要翻译： 一种改进计算机程序并行处理的方法。 DOACROSS循环和类似代码使用后等待控制结构进行标识和并行化。后等待控制结构可以被实现为包括执行执行顺序的单个计数器中的任何一个，用于跟踪由正整数的模数索引的代码完成的数组，和/或一组数组跟踪由线程完成的最后一个代码以及由线程执行的当前代码。

5.

发明授权
System, method and apparatus for dependency chain processing 有权
标题翻译：用于依赖关系链处理的系统，方法和装置

公开(公告)号：US07603546B2

公开(公告)日：2009-10-13

申请号：US10950693

申请日：2004-09-28

申请人： Satish Narayanasamy , Hong Wang , John Shen , Roni Rosner , Yoav Almog , Naftali Schwartz , Gerolf Hoflehner , Daniel LaVery , Wei Li , Xinmin Tian , Milind Girkar , Perry Wang

发明人： Satish Narayanasamy , Hong Wang , John Shen , Roni Rosner , Yoav Almog , Naftali Schwartz , Gerolf Hoflehner , Daniel LaVery , Wei Li , Xinmin Tian , Milind Girkar , Perry Wang

IPC分类号： G06F9/00 , G06F9/24 , G06F15/177

CPC分类号： G06F8/443 , G06F8/433 , G06F8/451

摘要： Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.

摘要翻译： 本发明的实施例提供了一种方法，装置和系统，其可以包括将依赖链分解成一组缩减宽度的依赖性链; 将一个或多个依赖关系链映射到一个或多个聚类依赖链处理器上，其中一个或多个所述簇的问题宽度适于适应所述依赖链的大小; 和/或并行处理多个跟踪的依赖性链。描述和要求保护其他实施例。

6.

发明申请
METHODS AND APPARATUS TO PROVIDE PARAMETERIZED OFFLOADING ON MULTIPROCESSOR ARCHITECTURES 审中-公开
标题翻译：在多处理器架构上提供参数化卸载的方法和装置

公开(公告)号：US20080163183A1

公开(公告)日：2008-07-03

申请号：US11618143

申请日：2006-12-29

申请人： Zhiyuan Li , Xinmin Tian , Wei Li , Hong Wang

发明人： Zhiyuan Li , Xinmin Tian , Wei Li , Hong Wang

IPC分类号： G06F9/45

CPC分类号： G06F8/456 , G06F2209/509

摘要： Methods and apparatus to provide parameterized offloading in multiprocessor systems are disclosed. An example method includes partitioning source code into a first task and a second task, and compiling object code from the source code, such that the first task is compiled to execute on a first processor core and the second task is compiled to execute on a second processor core, the assignment of the first task to the first core being dependent on an input parameter.

摘要翻译： 公开了在多处理器系统中提供参数化卸载的方法和装置。示例性方法包括将源代码分割成第一任务和第二任务，以及从源代码编译目标代码，使得第一任务被编译为在第一处理器核上执行，并且第二任务被编译为在第二任务上执行处理器核心，将第一个任务分配给第一个内核取决于输入参数。

7.

发明申请
Thread-data affinity optimization using compiler 有权
标题翻译：线程数据亲和力优化使用编译器

公开(公告)号：US20070079298A1

公开(公告)日：2007-04-05

申请号：US11242489

申请日：2005-09-30

申请人： Xinmin Tian , Milind Girkar , David Sehr , Richard Grove , Wei Li , Hong Wang , Chris Newburn , Perry Wang , John Shen

发明人： Xinmin Tian , Milind Girkar , David Sehr , Richard Grove , Wei Li , Hong Wang , Chris Newburn , Perry Wang , John Shen

IPC分类号： G06F9/45

CPC分类号： G06F8/45

摘要： Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.

摘要翻译： 线程数据亲和度优化可以在编译要在高速缓存相干非均匀内存访问（cc-NUMA）平台上执行的计算机程序时由编译器执行。在一个实施例中，本发明包括接收要编译的程序。接收的程序然后被编译成第一遍并被执行。在执行期间，编译器使用分析工具收集分析数据。然后，在第二遍，编译器使用收集的分析数据对程序执行线程数据关联优化。

8.

发明申请
System, method and apparatus for dependency chain processing 有权

公开(公告)号：US20060070047A1

公开(公告)日：2006-03-30

申请号：US10950693

申请日：2004-09-28

申请人： Satish Narayanasamy , Hong Wang , John Shen , Roni Rosner , Yoav Almog , Naftali Schwartz , Gerolf Hoflehner , Daniel LaVery , Wei Li , Xinmin Tian , Milind Girkar , Perry Wang

发明人： Satish Narayanasamy , Hong Wang , John Shen , Roni Rosner , Yoav Almog , Naftali Schwartz , Gerolf Hoflehner , Daniel LaVery , Wei Li , Xinmin Tian , Milind Girkar , Perry Wang

IPC分类号： G06F9/45

CPC分类号： G06F8/443 , G06F8/433 , G06F8/451

摘要： Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.

9.

发明授权
Method, system, and program of a compiler to parallelize source code 有权
标题翻译：编译器的方法，系统和程序来并行化源代码

公开(公告)号：US07882498B2

公开(公告)日：2011-02-01

申请号：US11278329

申请日：2006-03-31

申请人： Guilherme D. Ottoni , Xinmin Tian , Hong Wang , Richard A. Hankins , Wei Li , John Shen

发明人： Guilherme D. Ottoni , Xinmin Tian , Hong Wang , Richard A. Hankins , Wei Li , John Shen

IPC分类号： G06F9/45

CPC分类号： G06F8/456 , G06F8/314

摘要： Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.

摘要翻译： 提供了一种用于将源代码并行化为编译器的方法，系统和程序。收到包含源代码语句的源代码。处理源代码语句以确定语句的依赖关系。根据确定的语句依赖关系确定多组语句，其中一组中的语句彼此依赖。在源代码中插入至少一个指令，其中每个指令与一组语句相关联。产生的结果线程代码包括插入的至少一个指令。生成的线程代码中的指令所适用的语句组被处理为单独的任务。指定为要作为单独任务处理的指令的每组语句可以与其他语句组并发处理。

10.

发明授权
Thread-data affinity optimization using compiler 有权
标题翻译：线程数据亲和力优化使用编译器

公开(公告)号：US08037465B2

公开(公告)日：2011-10-11

申请号：US11242489

申请日：2005-09-30

申请人： Xinmin Tian , Milind Girkar , David C. Sehr , Richard Grove , Wei Li , Hong Wang , Chris Newburn , Perry Wang , John Shen

发明人： Xinmin Tian , Milind Girkar , David C. Sehr , Richard Grove , Wei Li , Hong Wang , Chris Newburn , Perry Wang , John Shen

IPC分类号： G06F9/44 , G06F9/45

CPC分类号： G06F8/45

摘要： Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.

摘要翻译： 线程数据亲和度优化可以在编译要在高速缓存相干非均匀内存访问（cc-NUMA）平台上执行的计算机程序时由编译器执行。在一个实施例中，本发明包括接收要编译的程序。接收的程序然后被编译成第一遍并被执行。在执行期间，编译器使用分析工具收集分析数据。然后，在第二遍，编译器使用收集的分析数据对程序执行线程数据关联优化。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类