专利检索 ap:("Alexander L. Minkin" OR "Steven J. Heinrich" OR "Rajeshwaran Selvanesan" OR "Charles McCarver" OR "Stewart Glenn Carlton" OR "Ming Y. Siu" OR "Yan Yan Tang" OR "Robert J. Stoll") AND inv:"Ming Y. Siu" 第 1 页

1.

发明授权
Cache miss processing using a defer/replay mechanism 有权
标题翻译：使用延迟/重播机制的缓存未命中处理

公开(公告)号：US08266383B1

公开(公告)日：2012-09-11

申请号：US12650189

申请日：2009-12-30

申请人： Alexander L. Minkin , Steven J. Heinrich , Rajeshwaran Selvanesan , Charles McCarver , Stewart Glenn Carlton , Ming Y. Siu , Yan Yan Tang , Robert J. Stoll

发明人： Alexander L. Minkin , Steven J. Heinrich , Rajeshwaran Selvanesan , Charles McCarver , Stewart Glenn Carlton , Ming Y. Siu , Yan Yan Tang , Robert J. Stoll

IPC分类号： G06F13/00 , G06F12/00 , G06F3/00 , G06F5/00

CPC分类号： G06F12/0859 , G06F12/084

摘要： One embodiment of the present invention sets forth a technique for processing cache misses resulting from a request received from one of the multiple clients of an L1 cache. The L1 cache services multiple clients with diverse latency and bandwidth requirements, including at least one client whose requests cannot be stalled. The L1 cache includes storage to buffer pending requests for caches misses. When an entry is available to store a pending request, a request causing a cache miss is accepted. When the data for a read request becomes available, the cache instructs the client to resubmit the read request to receive the data. When an entry is not available to store a pending request, a request causing a cache miss is deferred and the cache provides the client with status information that is used to determine when the request should be resubmitted.

摘要翻译： 本发明的一个实施例提出了一种用于处理由从L1高速缓存的多个客户端之一接收到的请求而产生的高速缓存未命中的技术。 L1缓存服务于具有不同延迟和带宽需求的多个客户端，包括至少一个客户端，其请求不能停止。 L1高速缓存包括缓存未缓存缓存请求的存储。当条目可用于存储挂起的请求时，接受导致高速缓存未命中的请求。当读请求的数据变得可用时，缓存指示客户端重新提交读请求以接收数据。当条目不可用于存储挂起的请求时，导致高速缓存未命中的请求被延迟，并且高速缓存为客户端提供用于确定何时应该重新提交请求的状态信息。

2.

发明授权
Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict 有权
标题翻译：对分配给每个读取请求端口的操作数重新排序并发访问多银行寄存器文件以避免银行冲突

公开(公告)号：US08533435B2

公开(公告)日：2013-09-10

申请号：US12875843

申请日：2010-09-03

申请人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman

发明人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman

IPC分类号： G06F9/34

CPC分类号： G06F9/3012 , G06F9/30098 , G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3887 , G06F9/3889

摘要： One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

摘要翻译： 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。由于接收到指令序列，指令指定的操作数被分配给端口，以便将由单个指令指定的每个操作数分配给不同的端口。通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数，以产生操作数读取请求，并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中银行寄存器文件。由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集，执行每条指令。

3.

发明申请
Unified Collector Structure for Multi-Bank Register File 有权
标题翻译：多银行登记册统一采集器结构

公开(公告)号：US20110072243A1

公开(公告)日：2011-03-24

申请号：US12875843

申请日：2010-09-03

申请人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman

发明人： Xiaogang Qiu , Ming Y. Siu , Yan Yan Tang , John Erik Lindholm , Michael C. Shebanow , Stuart F. Oberman

IPC分类号： G06F9/30

CPC分类号： G06F9/3012 , G06F9/30098 , G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3887 , G06F9/3889

摘要： One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

摘要翻译： 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。由于接收到指令序列，指令指定的操作数被分配给端口，以便将由单个指令指定的每个操作数分配给不同的端口。通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数，以产生操作数读取请求，并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中银行寄存器文件。由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集，执行每条指令。

4.

发明授权
Trap handler architecture for a parallel processing unit 有权
标题翻译：并行处理单元的陷阱处理器架构

公开(公告)号：US08522000B2

公开(公告)日：2013-08-27

申请号：US12569831

申请日：2009-09-29

申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

IPC分类号： G06F9/00

CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861

摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

摘要翻译： 陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

5.

发明申请
TRAP HANDLER ARCHITECTURE FOR A PARALLEL PROCESSING UNIT 有权
标题翻译：并行处理单元的TRAP操作架构

公开(公告)号：US20110078427A1

公开(公告)日：2011-03-31

申请号：US12569831

申请日：2009-09-29

申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

IPC分类号： G06F9/38

CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861

摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

摘要翻译： 陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

6.

发明授权
Opcode-specified predicatable warp post-synchronization 有权
标题翻译：操作码指定的可预测扭曲后同步

公开(公告)号：US08850436B2

公开(公告)日：2014-09-30

申请号：US12892887

申请日：2010-09-28

申请人： Brian Fahs , Ming Y. Siu , Robert Steven Glanville

发明人： Brian Fahs , Ming Y. Siu , Robert Steven Glanville

IPC分类号： G06F9/46 , G06F9/38 , G06F9/30

CPC分类号： G06F9/46 , G06F9/30072 , G06F9/30087 , G06F9/30185 , G06F9/3851 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack. Or, if the instruction is a predicated instruction that includes a synchronization command, then each active thread that executes the predicated instruction is monitored to determine when the active mask has been updated to indicate that each active thread, after executing the predicated instruction, has been disabled.

摘要翻译： 本发明的一个实施例提出了一种用于执行用于同步发散执行线程的方法的技术。该方法包括接收包括至少一个集合同步指令和包括同步命令的至少一个指令的多个指令，以及确定指示多个线程中的哪些线程是活动的活动掩码，以及多个线程中的哪些线程的线程被禁用。对于包括在多个指令中的每个指令，指令被发送到包括在多个线程中的每个活动线程。如果指令是设置同步指令，则将同步令牌，活动掩码和同步点分别压入堆栈。或者，如果指令是包括同步命令的预测指令，则监视执行预测指令的每个活动线程，以确定何时更新活动掩码以指示在执行预定指令之后每个活动线程已被残疾人士

7.

发明授权
Programmable graphics processor for multithreaded execution of programs 有权
标题翻译：用于多线程执行程序的可编程图形处理器

公开(公告)号：US08405665B2

公开(公告)日：2013-03-26

申请号：US13466043

申请日：2012-05-07

申请人： John Erik Lindholm , Brett W. Coon , Stuart F. Oberman , Ming Y. Siu , Matthew P. Gerlach

发明人： John Erik Lindholm , Brett W. Coon , Stuart F. Oberman , Ming Y. Siu , Matthew P. Gerlach

IPC分类号： G06F15/16 , G06F15/80 , G06F13/14 , G06T1/20

CPC分类号： G06T15/005

摘要： A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

摘要翻译： 处理单元包括多个执行流水线，每个执行流水线连接到第一输入部分，用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和用于存储经处理的顶点数据的第二输出部分。经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。经处理的像素数据被输出到光栅分析器。

8.

发明申请
SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
标题翻译：具有多个并行请求管理的共享单访存储器

公开(公告)号：US20120221808A1

公开(公告)日：2012-08-30

申请号：US13466057

申请日：2012-05-07

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

9.

发明授权
Scoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor 有权
标题翻译：记分牌具有用于跟踪多线程处理器中的顺序目的地寄存器使用的大小指示符

公开(公告)号：US08225076B1

公开(公告)日：2012-07-17

申请号：US12233515

申请日：2008-09-18

申请人： Brett W. Coon , Peter C. Mills , Stuart F. Oberman , Ming Y. Siu

发明人： Brett W. Coon , Peter C. Mills , Stuart F. Oberman , Ming Y. Siu

IPC分类号： G06F9/30

CPC分类号： G06F9/3851 , G06F9/3838 , G06F9/3879 , G06F9/3885

摘要： A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.

摘要翻译： 用于处理单元的记分板存储器具有分配给要处理的多个线程中的每一个的分离的存储器区域。对于每个线程，记分板存储器存储具有待处理写入的寄存器的寄存器标识符。当指令被添加到指令缓冲器中时，将指令中指定的寄存器的寄存器标识符与存储在该指令的线程的记分板存储器中的寄存器标识进行比较，并生成表示比较结果的多位值。多位值与指令一起存储在指令缓冲器中，并且可以更新为属于同一线程的指令完成其执行。在执行指令之前，将检查该多位值。如果该多位值表示指令中没有指定的寄存器没有挂起写操作，则允许指令执行。

10.

发明授权
Shared single-access memory with management of multiple parallel requests 有权
标题翻译：具有管理多个并行请求的共享单访问存储器

公开(公告)号：US08176265B2

公开(公告)日：2012-05-08

申请号：US13165638

申请日：2011-06-21

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类