专利检索 ap:"Michael FETTERMAN" 第 1 页

1.

发明授权
Pre-scheduled replays of divergent operations 有权

公开(公告)号：US10152329B2

公开(公告)日：2018-12-11

申请号：US13370173

申请日：2012-02-09

申请人： Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

发明人： Michael Fetterman , Stewart Glenn Carlton , Jack Hilaire Choquette , Shirish Gadre , Olivier Giroux , Douglas J. Hahn , Steven James Heinrich , Eric Lyell Hill , Charles McCarver , Omkar Paranjape , Anjana Rajendran , Rajeshwaran Selvanesan

IPC分类号： G06F9/38

摘要： One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

2.

发明授权
Uniform load processing for parallel thread sub-sets 有权

公开(公告)号：US10007527B2

公开(公告)日：2018-06-26

申请号：US13412438

申请日：2012-03-05

申请人： Michael Fetterman , Stewart Glenn Carlton , Douglas J. Hahn , Rajeshwaran Selvanesan , Shirish Gadre , Steven James Heinrich

发明人： Michael Fetterman , Stewart Glenn Carlton , Douglas J. Hahn , Rajeshwaran Selvanesan , Shirish Gadre , Steven James Heinrich

IPC分类号： G06F15/00 , G06F7/38 , G06F9/00 , G06F9/44 , G06F9/38

CPC分类号： G06F9/3887 , G06F9/383 , G06F9/3851

摘要： One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

3.

发明授权
Batched replays of divergent operations 有权

公开(公告)号：US09817668B2

公开(公告)日：2017-11-14

申请号：US13329066

申请日：2011-12-16

申请人： Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart Glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich

发明人： Michael Fetterman , Jack Hilaire Choquette , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Stewart Glenn Carlton , Rajeshwaran Selvanesan , Douglas J. Hahn , Steven James Heinrich

IPC分类号： G06F9/38

CPC分类号： G06F9/3851 , G06F9/3861

摘要： One embodiment of the present invention sets forth an approach for executing replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back.

4.

发明申请
MECHANISM FOR WAKING COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM 审中-公开
标题翻译：在资源管理子系统中摆放共同资源要求的机制

公开(公告)号：US20130311996A1

公开(公告)日：2013-11-21

申请号：US13476848

申请日：2012-05-21

申请人： Michael FETTERMAN , Shirish GADRE , John H. EDMONDSON , Omkar PARANJAPE , Anjana RAJENDRAN , Eric Lyell HILL , Rajeshwaran SELVANESAN , Charles McCARVER , Kevin MITCHELL , Steven James HEINRICH

发明人： Michael FETTERMAN , Shirish GADRE , John H. EDMONDSON , Omkar PARANJAPE , Anjana RAJENDRAN , Eric Lyell HILL , Rajeshwaran SELVANESAN , Charles McCARVER , Kevin MITCHELL , Steven James HEINRICH

IPC分类号： G06F9/46

CPC分类号： G06F9/5016 , G06F2209/503

摘要： One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

摘要翻译： 本公开的一个实施例阐述了在与重放操作相关的公共资源访问请求的调度中维持公平性和顺序的有效方式。具体地说，流式多处理器（SM）包括配置成通过一个或多个执行周期调度访问请求的总顺序队列（TOQ）。访问请求被允许在需要时将共同资源分配给该请求来进行进展。在多个访问请求需要相同的公共资源的情况下，优先级被赋予较旧的访问请求。访问请求可能处于睡眠状态，等待某些公共资源的可用性。可以通过允许较旧的访问请求从较年轻的资源请求中窃取资源来避免死锁。所公开的技术的一个优点是较旧的公共资源访问请求不被重复阻止以通过较新的访问请求提前进展。

5.

发明申请
SHAPED REGISTER FILE READS 有权
标题翻译：形状寄存器文件读取

公开(公告)号：US20130166877A1

公开(公告)日：2013-06-27

申请号：US13335868

申请日：2011-12-22

申请人： Jack Hilaire CHOQUETTE , Michael FETTERMAN , Shirish GADRE , Xiaogang QIU , Omkar PARANJAPE , Anjana RAJENDRAN , Stewart Glenn CARLTON , Eric Lyell HILL , Rajeshwaran SELVANESAN , Douglas J. HAHN

发明人： Jack Hilaire CHOQUETTE , Michael FETTERMAN , Shirish GADRE , Xiaogang QIU , Omkar PARANJAPE , Anjana RAJENDRAN , Stewart Glenn CARLTON , Eric Lyell HILL , Rajeshwaran SELVANESAN , Douglas J. HAHN

IPC分类号： G06F9/30

CPC分类号： G06F9/3851 , G06F9/3012

摘要： One embodiment of the present invention sets forth a technique for performing a shaped access of a register file that includes a set of N registers, wherein N is greater than or equal to two. The technique involves, for at least one thread included in a group of threads, receiving a request to access a first amount of data from each register in the set of N registers, and configuring a crossbar to allow the at least one thread to access the first amount of data from each register in the set of N registers.

摘要翻译： 本发明的一个实施例提出了一种用于执行包括一组N个寄存器的寄存器文件的形状访问的技术，其中N大于或等于2。该技术涉及对于包括在一组线程中的至少一个线程，接收从N个寄存器组中的每个寄存器访问第一数据量的请求，以及配置交叉开关以允许至少一个线程访问来自N个寄存器组中每个寄存器的第一个数据量。

6.

发明授权
Vector completion mask handling 有权
标题翻译：矢量完成掩码处理

公开(公告)号：US08239659B2

公开(公告)日：2012-08-07

申请号：US11529850

申请日：2006-09-29

申请人： Stephan Jourdan , Michael Fetterman , Michael Cornaby , Per Hammarlund , Ronak Signhal , Glenn Hinton

发明人： Stephan Jourdan , Michael Fetterman , Michael Cornaby , Per Hammarlund , Ronak Signhal , Glenn Hinton

IPC分类号： G06F15/00 , G06F15/76

CPC分类号： G06F9/3824 , G06F9/30036 , G06F15/8084

摘要： Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

摘要翻译： 提供矢量完成掩码（VCM）处理技术。数据结构包括用于特定操作的每个操作数的掩码字段。处理器尝试通过掩码字段在数据结构中标识的多个操作数来执行操作。如果成功检索操作数以执行操作，则数据结构中的相应掩码字段将被清除。如果任何字段在数据结构中保持设置，并且可以使用以前未被操作的操作数重新处理操作，则处理器可以重置。

7.

发明授权
Staggered execution stack for vector processing 有权
标题翻译：用于矢量处理的交错执行堆栈

公开(公告)号：US07457938B2

公开(公告)日：2008-11-25

申请号：US11240982

申请日：2005-09-30

申请人： Stephan Jourdan , Avinash Sodani , Michael Fetterman , Per Hammarlund , Ronak Singhal , Glenn Hinton

发明人： Stephan Jourdan , Avinash Sodani , Michael Fetterman , Per Hammarlund , Ronak Singhal , Glenn Hinton

IPC分类号： G06F19/00 , G06F15/00 , G06F17/00

CPC分类号： G06F9/3001 , G06F9/3012 , G06F9/3885

摘要： In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.

摘要翻译： 在一个实施例中，本发明包括一种用于使用处理器的第一执行堆栈来执行第一和第二源操作数的低阶部分的操作的方法，并且使用第二和第二源操作数对第一和第二源操作数的高阶部分执行操作处理器的执行堆栈，其中第二执行堆栈中的操作与第一执行堆栈中的操作交错一个或多个周期。描述和要求保护其他实施例。

8.

发明授权
Load mechanism 有权
标题翻译：负载机制

公开(公告)号：US07457932B2

公开(公告)日：2008-11-25

申请号：US11323000

申请日：2005-12-30

申请人： Per Hammarlund , Stephan Jourdan , Michael Fetterman , Glenn Hinton , Sebastien Hily , Ronak Singhal

发明人： Per Hammarlund , Stephan Jourdan , Michael Fetterman , Glenn Hinton , Sebastien Hily , Ronak Singhal

IPC分类号： G06F12/00

CPC分类号： G06F9/30043 , G06F9/30032

摘要： A method is disclosed. The method includes scheduling a load operation at least twice the size of a maximum access supported by a memory device, dividing the load operation into a plurality of separate load operation segments having a size equivalent to the maximum access supported by the memory device, and performing each of the plurality of load operation segments. A further method is disclosed where a temporary register is used to minimize the number of memory accesses to support unaligned accesses.

摘要翻译： 公开了一种方法。该方法包括将加载操作调度至少是由存储器件支持的最大访问大小的两倍，将加载操作划分成具有等于存储器设备支持的最大访问大小的多个单独的加载操作段，以及执行多个加载操作段中的每一个。公开了一种另外的方法，其中使用临时寄存器来最小化用于支持未对齐访问的存储器访问的数量。

9.

发明授权
Flow optimization and prediction for VSSE memory operations 有权
标题翻译： VSSE存储器操作的流优化和预测

公开(公告)号：US07404065B2

公开(公告)日：2008-07-22

申请号：US11315964

申请日：2005-12-21

申请人： Stephan Jourdan , Per Hammarlund , Michael Fetterman , Michael P. Cornaby , Glenn Hinton , Avinash Sodani

发明人： Stephan Jourdan , Per Hammarlund , Michael Fetterman , Michael P. Cornaby , Glenn Hinton , Avinash Sodani

IPC分类号： G06F15/00 , G06F15/76 , G06F9/45

CPC分类号： G06F9/345 , G06F9/3017 , G06F9/325 , G06F9/3455 , G06F9/3844

摘要： In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (μop) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized μop flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.

摘要翻译： 在一个实施例中，公开了用于向量流单个指令，多数据（SIMD）扩展（VSSE）存储器操作的流优化和预测的方法。该方法包括：如果预测指令是未屏蔽和单步的，则生成用于对矢量进行操作的指令的优化的微操作（muop）流程，访问存储器中的元件的指令以及经由优化的muop流2访问或更多的元素，而不确定两个或更多个元件的掩模。还描述了其它实施例。

10.

发明授权
Mechanism for waking common resource requests within a resource management subsystem 有权

公开(公告)号：US10095548B2

公开(公告)日：2018-10-09

申请号：US13476848

申请日：2012-05-21

申请人： Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich

发明人： Michael Fetterman , Shirish Gadre , John H. Edmondson , Omkar Paranjape , Anjana Rajendran , Eric Lyell Hill , Rajeshwaran Selvanesan , Charles McCarver , Kevin Mitchell , Steven James Heinrich

IPC分类号： G06F9/46 , G06F9/50

摘要： One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类