-
公开(公告)号:US10095548B2
公开(公告)日:2018-10-09
申请号:US13476848
申请日:2012-05-21
申请人: Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich
发明人: Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich
摘要: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
-
公开(公告)号:US09836325B2
公开(公告)日:2017-12-05
申请号:US13476791
申请日:2012-05-21
申请人: Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich
发明人: Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich
CPC分类号: G06F9/5011 , G06F2209/507
摘要: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
-
3.
公开(公告)号:US09755994B2
公开(公告)日:2017-09-05
申请号:US13476825
申请日:2012-05-21
申请人: Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich
发明人: Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich
IPC分类号: G06F13/12 , G06F9/46 , H04L12/937
CPC分类号: H04L49/254 , G06F9/46
摘要: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
-
公开(公告)号:US10152329B2
公开(公告)日:2018-12-11
申请号:US13370173
申请日:2012-02-09
申请人: Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan
发明人: Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan
IPC分类号: G06F9/38
摘要: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.
-
公开(公告)号:US09626191B2
公开(公告)日:2017-04-18
申请号:US13335868
申请日:2011-12-22
申请人: Jack Hilaire Choquette , Michael Fetterman , Shirish Gadre , Xiaogang Qiu , Omkar Paranjape , Anjana Rajendran , Stewart Glenn Carlton , Eric Lyell Hill , Rajeshwaran Selvanesan , Douglas J. Hahn
发明人: Jack Hilaire Choquette , Michael Fetterman , Shirish Gadre , Xiaogang Qiu , Omkar Paranjape , Anjana Rajendran , Stewart Glenn Carlton , Eric Lyell Hill , Rajeshwaran Selvanesan , Douglas J. Hahn
CPC分类号: G06F9/3851 , G06F9/3012
摘要: One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.
-
公开(公告)号:US20130159684A1
公开(公告)日:2013-06-20
申请号:US13329066
申请日:2011-12-16
申请人: Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich
发明人: Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich
CPC分类号: G06F9/3851 , G06F9/3861
摘要: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.
摘要翻译: 本发明的一个实施例阐述了在并行处理子系统中对发散操作执行重放操作的优化方法。 具体地说,流式多处理器(SM)包括多级流水线,其被配置为批量两个或更多个重播操作以便经由重放循环进行处理。 多级流水线内的逻辑元件检测当前流水线阶段是否正在访问共享资源,例如从共享内存加载数据。 如果线程正在访问分布在多个高速缓存行中的数据,则多级管道批量执行两个或更多个重放操作,其中重放操作被背对背地插入到管道中。 有利地,需要两次或更多次重放操作的发散操作以降低的等待时间运行。 在存储器访问操作需要传送两条以上的高速缓存行以服务所有线程的情况下,完成所有重放操作所需的时钟周期数减少。
-
公开(公告)号:US09817668B2
公开(公告)日:2017-11-14
申请号:US13329066
申请日:2011-12-16
申请人: Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart Glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich
发明人: Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart Glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich
IPC分类号: G06F9/38
CPC分类号: G06F9/3851 , G06F9/3861
摘要: One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.
-
公开(公告)号:US09262174B2
公开(公告)日:2016-02-16
申请号:US13440945
申请日:2012-04-05
申请人: Michael Fetterman , Stewart Glenn Carlton , Douglas J. Hahn , Rajeshwaran Selvanesan , Shirish Gadre , Steven James Heinrich
发明人: Michael Fetterman , Stewart Glenn Carlton , Douglas J. Hahn , Rajeshwaran Selvanesan , Shirish Gadre , Steven James Heinrich
CPC分类号: G06F9/3887 , G06F9/3851
摘要: One embodiment sets forth a technique for dynamically mapping addresses to banks of a multi-bank memory based on a bank mode. Application programs may be configured to perform read and write a memory accessing different numbers of bits per bank, e.g., 32-bits per bank, 64-bits per bank, or 128-bits per bank. On each clock cycle an access request may be received from one of the application programs and per processing thread addresses of the access request are dynamically mapped based on the bank mode to produce a set of bank addresses. The bank addresses are then used to access the multi-bank memory. Allowing different bank mappings enables each application program to avoid bank conflicts when the memory is accesses compared with using a single bank mapping for all accesses.
摘要翻译: 一个实施例提出了一种用于基于银行模式将地址动态地映射到多存储体存储器的存储体的技术。 应用程序可以被配置为执行读取和写入访问每个存储体的不同位数的存储器,例如每个存储体32位,每个存储体64位或每个存储体128位。 在每个时钟周期上,可以从应用程序之一接收访问请求,并且基于所述存储体模式动态地映射访问请求的每个处理线程地址以产生一组存储体地址。 然后,银行地址用于访问多存储存储器。 允许不同的银行映射使每个应用程序避免存储器访问时的存储器冲突,与对所有访问使用单个存储库映射相比。
-
公开(公告)号:US10007527B2
公开(公告)日:2018-06-26
申请号:US13412438
申请日:2012-03-05
申请人: Michael Fetterman , Stewart Glenn Carlton , Douglas J. Hahn , Rajeshwaran Selvanesan , Shirish Gadre , Steven James Heinrich
发明人: Michael Fetterman , Stewart Glenn Carlton , Douglas J. Hahn , Rajeshwaran Selvanesan , Shirish Gadre , Steven James Heinrich
CPC分类号: G06F9/3887 , G06F9/383 , G06F9/3851
摘要: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.
-
公开(公告)号:US07500049B2
公开(公告)日:2009-03-03
申请号:US11263628
申请日:2005-10-31
IPC分类号: G06F12/00
CPC分类号: G06F9/463 , G06F11/1438 , G06F11/2033
摘要: In one embodiment, the present invention includes a method for requesting an allocation of memory to be a backing store for architectural state information of a processor and storing the architectural state information in the backing store using an application. In this manner, the backing store and processor enhancements using information in the backing store may be transparent to an operating system. Other embodiments are described and claimed.
摘要翻译: 在一个实施例中,本发明包括一种用于请求分配作为用于处理器的体系结构状态信息的后备存储器的存储器并使用应用程序将架构状态信息存储在后备存储器中的方法。 以这种方式,使用后备存储器中的信息的后备存储和处理器增强可能对操作系统是透明的。 描述和要求保护其他实施例。
-
-
-
-
-
-
-
-
-