专利检索 ap:("Xinmin Tian" OR "Shih-wei Liao" OR "Hong Wang" OR "Milind Girkar" OR "John Shen" OR "Perry Wang" OR "Grant Haab" OR "Gerolf Hoflehner" OR "Daniel Lavery" OR "Hideki Saito" OR "Sanjiv Shah" OR "Dongkeun Kim") AND inv:"Sanjiv Shah" 第 1 页

1.

发明申请
Methods and apparatus for reducing memory latency in a software application 有权
标题翻译：减少软件应用程序内存延迟的方法和装置

公开(公告)号：US20050086652A1

公开(公告)日：2005-04-21

申请号：US10677414

申请日：2003-10-02

申请人： Xinmin Tian , Shih-Wei Liao , Hong Wang , Milind Girkar , John Shen , Perry Wang , Grant Haab , Gerolf Hoflehner , Daniel Lavery , Hideki Saito , Sanjiv Shah , Dongkeun Kim

发明人： Xinmin Tian , Shih-Wei Liao , Hong Wang , Milind Girkar , John Shen , Perry Wang , Grant Haab , Gerolf Hoflehner , Daniel Lavery , Hideki Saito , Sanjiv Shah , Dongkeun Kim

IPC分类号： G06F9/38 , G06F9/45 , G06F9/46 , G06F9/48

CPC分类号： G06F9/3851 , G06F8/4442 , G06F9/383 , G06F9/4843 , G06F9/52

摘要： Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

摘要翻译： 公开了一种用于减少软件应用中的存储器延迟的方法和装置。所公开的系统使用一个或多个辅助线程来预取主线程的变量，以减少由于存储器延迟和/或高速缓存未命中引起的性能瓶颈。使用性能分析工具来描述软件应用程序的资源使用情况，并识别遇到性能瓶颈的软件应用程序中的区域。编译器运行时指令生成到软件应用程序中以创建和管理辅助线程。辅助线程预取了遇到性能瓶颈的软件应用程序的已识别区域中的数据。计数机制被插入到辅助线程中，并且计数机制被插入到主线程中以协调辅助线程与主线程的执行，并且有助于确保在主线程可用之前预取数据不被从高速缓存中移除以利用预取的数据。

2.

发明授权
Methods and apparatus for reducing memory latency in a software application 有权
标题翻译：减少软件应用程序内存延迟的方法和装置

公开(公告)号：US07328433B2

公开(公告)日：2008-02-05

申请号：US10677414

申请日：2003-10-02

申请人： Xinmin Tian , Shih-wei Liao , Hong Wang , Milind Girkar , John Shen , Perry Wang , Grant Haab , Gerolf Hoflehner , Daniel Lavery , Hideki Saito , Sanjiv Shah , Dongkeun Kim

发明人： Xinmin Tian , Shih-wei Liao , Hong Wang , Milind Girkar , John Shen , Perry Wang , Grant Haab , Gerolf Hoflehner , Daniel Lavery , Hideki Saito , Sanjiv Shah , Dongkeun Kim

IPC分类号： G06F9/44

CPC分类号： G06F9/3851 , G06F8/4442 , G06F9/383 , G06F9/4843 , G06F9/52

摘要： Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

摘要翻译： 公开了一种用于减少软件应用中的存储器延迟的方法和装置。所公开的系统使用一个或多个辅助线程来预取主线程的变量，以减少由于存储器延迟和/或高速缓存未命中引起的性能瓶颈。使用性能分析工具来描述软件应用程序的资源使用情况，并识别遇到性能瓶颈的软件应用程序中的区域。编译器运行时指令生成到软件应用程序中以创建和管理辅助线程。辅助线程预取了遇到性能瓶颈的软件应用程序的已识别区域中的数据。计数机制被插入到辅助线程中，并且计数机制被插入到主线程中以协调辅助线程与主线程的执行，并且有助于确保在主线程可用之前预取数据不被从高速缓存中移除以利用预取的数据。

3.

发明授权
Compiler-based scheduling optimization hints for user-level threads 有权
标题翻译：基于编译器的调度优化提示用户级线程

公开(公告)号：US08205200B2

公开(公告)日：2012-06-19

申请号：US11289803

申请日：2005-11-29

申请人： Shih-wei Liao , Ryan N. Rakvic , Richard A. Hankins , Hong Wang , Gansha Wu , Guei-Yuan Lueh , Xinmin Tian , Paul M. Petersen , Sanjiv Shah , Trung Diep , John Shen , Gautham Chinya

发明人： Shih-wei Liao , Ryan N. Rakvic , Richard A. Hankins , Hong Wang , Gansha Wu , Guei-Yuan Lueh , Xinmin Tian , Paul M. Petersen , Sanjiv Shah , Trung Diep , John Shen , Gautham Chinya

IPC分类号： G06F9/44 , G06F9/46

CPC分类号： G06F9/485 , G06F9/4881

摘要： Method, apparatus and system embodiments to schedule user-level OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. The scheduler routine may receive compiler-generated hints from a compiler. The compiler hints may be generated by the compiler without user-provided pragmas, and may be passed to the scheduler routine via an API-like interface. The interface may include a scheduling hint data structure that is maintained by the compiler. Other embodiments are also described and claimed.

摘要翻译： 方法，装置和系统实施例来调度用户级别的与OS无关的“碎片”，而不需要操作系统的干预。对于至少一个实施例，碎片被调度为由调度器例程而不是操作系统执行。调度程序例程可以从编译器接收编译器生成的提示。编译器提示可能由编译器生成，而不需要用户提供的编译指示，并且可以通过类API接口传递给调度程序。接口可以包括由编译器维护的调度提示数据结构。还描述和要求保护其他实施例。

4.

发明授权
Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors 失效
标题翻译：快速无锁后等待同步，以利于多核处理器上的并行性

公开(公告)号：US07571301B2

公开(公告)日：2009-08-04

申请号：US11395841

申请日：2006-03-31

申请人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

发明人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

IPC分类号： G06F9/45 , G06F9/52

CPC分类号： G06F9/3009 , G06F8/458 , G06F9/30087 , G06F9/3836 , G06F9/3838 , G06F9/3851 , G06F9/3855 , G06F9/3857 , G06F9/3891

摘要： A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

摘要翻译： 一种改进计算机程序并行处理的方法。 DOACROSS循环和类似代码使用后等待控制结构进行标识和并行化。后等待控制结构可以被实现为包括执行执行顺序的单个计数器中的任何一个，用于跟踪由正整数的模数索引的代码完成的数组，和/或一组数组跟踪由线程完成的最后一个代码以及由线程执行的当前代码。

5.

发明申请
Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors 失效
标题翻译：快速无锁后等待同步，以利于多核处理器上的并行性

公开(公告)号：US20070234326A1

公开(公告)日：2007-10-04

申请号：US11395841

申请日：2006-03-31

申请人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

发明人： Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Sanjiv Shah , Wei Li , Utpal Banerjee

IPC分类号： G06F9/45

CPC分类号： G06F9/3009 , G06F8/458 , G06F9/30087 , G06F9/3836 , G06F9/3838 , G06F9/3851 , G06F9/3855 , G06F9/3857 , G06F9/3891

摘要： A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

摘要翻译： 一种改进计算机程序并行处理的方法。 DOACROSS循环和类似代码使用后等待控制结构进行标识和并行化。后等待控制结构可以被实现为包括执行执行顺序的单个计数器中的任何一个，用于跟踪由正整数的模数索引的代码完成的数组，和/或一组数组跟踪由线程完成的最后一个代码以及由线程执行的当前代码。

6.

发明申请
Compiler-based scheduling optimization hints for user-level threads 有权
标题翻译：基于编译器的调度优化提示用户级线程

公开(公告)号：US20070124732A1

公开(公告)日：2007-05-31

申请号：US11289803

申请日：2005-11-29

申请人： Shih-wei Lia , Ryan Rakvic , Richard Hankins , Hong Wang , Gansha Wu , Guei-Yuan Lueh , Xinmin Tian , Paul Petersen , Sanjiv Shah , Trung Diep , John Shen , Gautham Chinya

发明人： Shih-wei Lia , Ryan Rakvic , Richard Hankins , Hong Wang , Gansha Wu , Guei-Yuan Lueh , Xinmin Tian , Paul Petersen , Sanjiv Shah , Trung Diep , John Shen , Gautham Chinya

IPC分类号： G06F9/46

CPC分类号： G06F9/485 , G06F9/4881

摘要： Method, apparatus and system embodiments to schedule user-level OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. The scheduler routine may receive compiler-generated hints from a compiler. The compiler hints may be generated by the compiler without user-provided pragmas, and may be passed to the scheduler routine via an API-like interface. The interface may include a scheduling hint data structure that is maintained by the compiler. Other embodiments are also described and claimed.

摘要翻译： 方法，装置和系统实施例来调度用户级别的与OS无关的“碎片”，而不需要操作系统的干预。对于至少一个实施例，碎片被调度为由调度器例程而不是操作系统执行。调度程序例程可以从编译器接收编译器生成的提示。编译器提示可能由编译器生成，而不需要用户提供的编译指示，并且可以通过类API接口传递给调度程序。接口可以包括由编译器维护的调度提示数据结构。还描述和要求保护其他实施例。

7.

发明申请
Sequencer address management 有权
标题翻译：排序器地址管理

公开(公告)号：US20060224858A1

公开(公告)日：2006-10-05

申请号：US11100032

申请日：2005-04-05

申请人： Hong Wang , Gautham Chinya , Richard Hankins , Shivnandan Kaushik , Bryant Bigbee , Per Hammarlund , Xiang Zou , Jason Brandt , Prashant Sethi , Douglas Carmean , Baiju Patel , John Shen , Scott Rodgers , Ryan Rakvic , John Reid , David Poulsen , Sanjiv Shah , James Held , James Abel

发明人： Hong Wang , Gautham Chinya , Richard Hankins , Shivnandan Kaushik , Bryant Bigbee , Per Hammarlund , Xiang Zou , Jason Brandt , Prashant Sethi , Douglas Carmean , Baiju Patel , John Shen , Scott Rodgers , Ryan Rakvic , John Reid , David Poulsen , Sanjiv Shah , James Held , James Abel

IPC分类号： G06F12/00

CPC分类号： G06F9/485 , G06F9/30043 , G06F9/30076 , G06F9/3851 , G06F9/3885 , G06F9/3891 , G06F9/461 , G06F9/4881

摘要： Disclosed are embodiments of a system, methods and mechanism for management and translation of mapping between logical sequencer addresses and physical or logical sequencers in a multi-sequencer multithreading system. A mapping manager may manage assignment and mapping of logical sequencer addresses or pages to actual sequencers or frames of the system. Rationing logic associated with the mapping manager may take into account sequencer attributes when such mapping is performed Relocation logic associated with the mapping manager may manage spill and fill of context information to/from a backing store when re-mapping actual sequencers. Sequencers may be allocated singly, or may be allocated as part of partitioned blocks. The mapping manager may also include translation logic that provides an identifier for the mapped sequencer each time a logical sequencer address is used in a user program. Other embodiments are also described and claimed.

摘要翻译： 公开了用于管理和转换逻辑定序器地址与多序列器多线程系统中的物理或逻辑顺控程序之间的映射的系统，方法和机制的实施例。映射管理器可以管理逻辑顺控程序地址或页面到系统的实际定序器或帧的分配和映射。与映射管理器相关联的配给逻辑可以在执行这样的映射时考虑定序器属性。当重新映射实际的定序器时，与映射管理器相关联的重定位逻辑可以管理向/从备份存储器的上下文信息的溢出和填充。排序器可以单独分配，或者可以被分配为分区块的一部分。映射管理器还可以包括每当在用户程序中使用逻辑定序器地址时为映射的定序器提供标识符的翻译逻辑。还描述和要求保护其他实施例。

8.

发明申请
Load balancing for multi-threaded applications via asymmetric power throttling 有权
标题翻译：通过不对称功率节流对多线程应用进行负载平衡

公开(公告)号：US20070157206A1

公开(公告)日：2007-07-05

申请号：US11322823

申请日：2005-12-30

申请人： Ryan Rakvic , Richard Hankins , Ed Grochowski , Hong Wang , Murali Annavaram , David Poulsen , Sanjiv Shah , John Shen , Gautham Chinya

发明人： Ryan Rakvic , Richard Hankins , Ed Grochowski , Hong Wang , Murali Annavaram , David Poulsen , Sanjiv Shah , John Shen , Gautham Chinya

IPC分类号： G06F9/46

CPC分类号： G06F9/4893 , Y02D10/24

摘要： A first execution time of a first thread executing on a first processing unit of a multiprocessor is determined. A second execution time of a second thread executing on a second processing unit of the multiprocessor is determined, the first and second threads executing in parallel. Power is set to the first and second processing units to effectuate the first and second threads to finish executing at approximately the same time in future executions of the first and second threads. Other embodiments are also described and claimed.

摘要翻译： 确定在多处理器的第一处理单元上执行的第一线程的第一执行时间。确定在多处理器的第二处理单元上执行的第二线程的第二执行时间，第一和第二线程并行执行。功率被设置为第一和第二处理单元，以在第一和第二线程的将来执行中在大致相同的时刻实现第一和第二线程完成执行。还描述和要求保护其他实施例。

9.

发明申请
Data structure and management techniques for local user-level thread data 有权
标题翻译：本地用户级线程数据的数据结构和管理技术

公开(公告)号：US20070150900A1

公开(公告)日：2007-06-28

申请号：US11321763

申请日：2005-12-27

申请人： Richard Hankins , Gautham Chinya , Hong Wang , David Poulsen , Shirish Aundhe , John Shen , Sanjiv Shah , Baiju Patel

发明人： Richard Hankins , Gautham Chinya , Hong Wang , David Poulsen , Shirish Aundhe , John Shen , Sanjiv Shah , Baiju Patel

IPC分类号： G06F9/46

CPC分类号： G06F9/462

摘要： Data structure creation, organization and management techniques for data local to user-level threads are provided. Other embodiments are also described and claimed.

摘要翻译： 提供数据结构创建，用户级线程本地数据的组织和管理技术。还描述和要求保护其他实施例。

10.

发明授权
Load balancing for multi-threaded applications via asymmetric power throttling 有权
标题翻译：通过不对称功率节流对多线程应用进行负载平衡

公开(公告)号：US08839258B2

公开(公告)日：2014-09-16

申请号：US13354623

申请日：2012-01-20

申请人： Ryan Rakvic , Richard A. Hankins , Ed Grochowski , Hong Wang , Murali Annavaram , David K. Poulsen , Sanjiv Shah , John Shen , Gautham Chinya

发明人： Ryan Rakvic , Richard A. Hankins , Ed Grochowski , Hong Wang , Murali Annavaram , David K. Poulsen , Sanjiv Shah , John Shen , Gautham Chinya

IPC分类号： G06F9/46 , G06F1/00 , G06F1/26 , G06F9/48

CPC分类号： G06F9/4893 , Y02D10/24

摘要： A first execution time of a first thread executing on a first processing unit of a multiprocessor is determined. A second execution time of a second thread executing on a second processing unit of the multiprocessor is determined, the first and second threads executing in parallel. Power is set to the first and second processing units to effectuate the first and second threads to finish executing at approximately the same time in future executions of the first and second threads. Other embodiments are also described and claimed.

摘要翻译： 确定在多处理器的第一处理单元上执行的第一线程的第一执行时间。确定在多处理器的第二处理单元上执行的第二线程的第二执行时间，第一和第二线程并行执行。功率被设置为第一和第二处理单元，以在第一和第二线程的将来执行中在大致相同的时刻实现第一和第二线程完成执行。还描述和要求保护其他实施例。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类