Architecture and Instructions for Accessing Multi-Dimensional Formatted Surface Memory
    42.
    发明申请
    Architecture and Instructions for Accessing Multi-Dimensional Formatted Surface Memory 有权
    用于访问多维格式化表面存储器的体系结构和说明

    公开(公告)号:US20110074802A1

    公开(公告)日:2011-03-31

    申请号:US12890171

    申请日:2010-09-24

    IPC分类号: G06F12/00

    CPC分类号: G06T1/60

    摘要: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.

    摘要翻译: 本发明的一个实施例提出了一种用于访问多维格式化图形表面存储器的程序的技术。 称为“表面”的多维存储器对象以用户指定的数据或像素格式存储并以图形优化的布局布置,由使用表面指令的程序访问。 可以使用一组存储器访问指令,例如加载,存储,减少和原子,称为表面指令,以访问表面。 通过可配置的夹紧进行坐标界限检查。 缓存行为也可以由表面指令指定。 支持存储,缩小和原子表面指令的数据格式转换和打包到指定的存储格式。 负载和原子表面指令支持从指定的存储格式进行数据格式转换和解包。

    Synchronization of threads in a cooperative thread array
    43.
    发明授权
    Synchronization of threads in a cooperative thread array 有权
    协同线程数组中的线程同步

    公开(公告)号:US07788468B1

    公开(公告)日:2010-08-31

    申请号:US11303780

    申请日:2005-12-15

    IPC分类号: G06F15/00 G06F15/76

    摘要: A “cooperative thread array,” or “CTA,” is a group of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique thread identifier assigned at thread launch time that controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Different threads of the CTA are advantageously synchronized at appropriate points during CTA execution using a barrier synchronization technique in which barrier instructions in the CTA program are detected and used to suspend execution of some threads until a specified number of other threads also reaches the barrier point.

    摘要翻译: “协同线程数组”或“CTA”是一组多个线程,它们在输入数据集上同时执行相同的程序以产生输出数据集。 CTA中的每个线程都具有在线程启动时分配的唯一线程标识符,用于控制线程的处理行为的各个方面,例如要由每个线程处理的输入数据集的部分,要生成的输出数据集的部分 通过每个线程,和/或在线程之间共享中间结果。 CTA的不同线程有利地在CTA执行期间在适当的点处同步,其中使用屏障同步技术,其中检测到CTA程序中的障碍指令并用于暂停某些线程的执行,直到指定数量的其他线程也到达屏障点。

    Register based queuing for texture requests

    公开(公告)号:US07027062B2

    公开(公告)日:2006-04-11

    申请号:US10789735

    申请日:2004-02-27

    IPC分类号: G06T11/40

    CPC分类号: G06T11/60 G09G5/363

    摘要: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

    Pipelined multi-access memory apparatus and method
    45.
    发明授权
    Pipelined multi-access memory apparatus and method 有权
    流水线多路存储设备及方法

    公开(公告)号:US06976141B2

    公开(公告)日:2005-12-13

    申请号:US10002449

    申请日:2001-11-02

    IPC分类号: G06F12/00 G06F13/16

    CPC分类号: G06F13/1615

    摘要: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.

    摘要翻译: 存储器管理系统提供了多个请求者以流水线方式访问存储块的能力。 在第一时钟期间,系统接收对一个或多个存储器块的请求。 确定一个或多个请求是否请求一个存储器块。 如果由两个或更多个请求请求相同的存储器块,则系统进一步确定哪个请求将被提供给存储器块。 所确定的请求被提供给第一时钟上的存储器块。 在第二时钟期间,确定的请求的数据被锁存到存储器块,并且启动存储器访问。 如果请求是写请求,则将数据写入存储块。 如果请求是读请求,则检索所请求的数据,并且在第三时钟将数据驱动到总线上,路由到确定的请求者,并且可以在第四时钟被锁存到请求者中。

    Defect tolerant redundancy
    46.
    发明授权
    Defect tolerant redundancy 有权
    缺陷容错冗余

    公开(公告)号:US06879207B1

    公开(公告)日:2005-04-12

    申请号:US10741243

    申请日:2003-12-18

    申请人: John R. Nickolls

    发明人: John R. Nickolls

    IPC分类号: G06F11/20 G11C29/00 G06F1/04

    CPC分类号: G11C29/848

    摘要: Circuits, methods, and apparatus for using redundant circuitry on integrated circuits in order to increase manufacturing yields. One exemplary embodiment of the present invention provides a circuit configuration wherein functional circuit blocks in a group of circuit blocks are selected by multiplexers. Multiplexers at the input and output of the group of circuit blocks steer input and output signals to and from functional circuit blocks, avoiding circuit blocks found to be defective or nonfunctional. Multiple groups of these circuit blocks may be arranged in series and in parallel. Alternate multiplexer configurations may be used in order to provide a higher level of redundancy. Other embodiments use all functional circuit blocks and sort integrated circuits based on the level of functionality or performance. Other embodiments provide methods of testing integrated circuits having one or more of these circuit configurations.

    摘要翻译: 用于在集成电路上使用冗余电路的电路,方法和装置,以增加制造产量。 本发明的一个示例性实施例提供一种电路配置,其中一组电路块中的功能电路块由多路复用器选择。 电路组输入和输出的多路复用器将输入和输出信号转换到功能电路块和从功能电路块输出,避免电路块发现有故障或无功能。 这些电路块的多组可以串联和并联布置。 可以使用替代多路复用器配置以提供更高级别的冗余。 其他实施例使用所有功能电路块并且基于功能或性能的级别对集成电路进行分类。 其他实施例提供了测试具有这些电路配置中的一个或多个的集成电路的方法。

    Scalable processor to processor and processor to I/O interconnection
network and method for parallel processing arrays
    47.
    发明授权
    Scalable processor to processor and processor to I/O interconnection network and method for parallel processing arrays 失效
    可扩展处理器到处理器和处理器到I / O互连网络和并行处理阵列的方法

    公开(公告)号:US5598408A

    公开(公告)日:1997-01-28

    申请号:US182250

    申请日:1994-01-14

    CPC分类号: G06F15/17393

    摘要: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.

    摘要翻译: 公开了一种具有全局路由器网络的大规模并行计算机系统,其中流水线寄存器在空间上分布以增加全局路由器网络的消息传送速度。 全局路由器网络包括用于处理器到I / O消息传递的扩展抽头,以便I / O消息带宽与处理器间消息带宽相匹配。 路由开启消息分组包括与转向比特均匀对待的协议比特。 路由开启分组还包括冗余地址比特,用于向全球路由器网络内的路由器芯片赋予多交叉形状个性。 还公开了用于空间支持大规模并行系统和全局路由器网络的处理器的结构和方法。

    Broadcasting headers to configure physical devices interfacing a data
bus with a logical assignment and to effect block data transfers
between the configured logical devices
    48.
    发明授权
    Broadcasting headers to configure physical devices interfacing a data bus with a logical assignment and to effect block data transfers between the configured logical devices 失效
    广播头以配置将数据总线与逻辑分配接口的物理设备,并在配置的逻辑设备之间实现块数据传输

    公开(公告)号:US5488694A

    公开(公告)日:1996-01-30

    申请号:US937639

    申请日:1992-08-28

    IPC分类号: G06F13/42 G06F13/00 G06F13/38

    CPC分类号: G06F13/423

    摘要: To effect a block data transfer between a plurality of physical I/O devices coupled through interfaces to an I/O channel ("IOC") bus, a source logical device is established by programmably assigning to each of the physical device interfaces a logical device identifier, a leaf identifier determining when the physical device participates relative to the first data transfer in the block data transfer, a burst count specifying the number of consecutive transfers for which the physical device is responsible when its interleave period arrives, and an interleave factor identifying how often the physical device participates in the block data transfer. A destination logical device is similarly established. The source and logical devices are then activated to accomplish a block transfer of data between them. To permit different I/O processors to operate independently in making I/O requests, requests from each I/O processor are communicated to an IOC controller over another bus, which need not be a high performance bus, and are serviced to construct header packets in a transaction buffer identifying IOC transactions, including source and destination logical devices. When each packet is finished, the responsible I/O processor puts a pointer into a transaction queue, which is a FIFO register. Each IOC transaction is initiated as its corresponding pointer is popped from the transaction queue. Apparatus embodiments are disclosed as well.

    摘要翻译: 为了实现通过与I / O通道(“IOC”)总线的接口耦合的多个物理I / O设备之间的块数据传输,通过可编程地向每个物理设备接口分配逻辑设备来建立源逻辑设备 标识符,确定物理设备何时相对于块数据传输中的第一数据传输参与的叶标识符,指定物理设备在其交织周期到达时负责的连续传输次数的突发计数,以及交织因子识别 物理设备参与块数据传输的频率。 类似地建立目的地逻辑设备。 然后激活源和逻辑设备以在它们之间实现数据的块传输。 为了允许不同的I / O处理器在进行I / O请求时独立运行,来自每个I / O处理器的请求通过不需要是高性能总线的另一总线传送给IOC控制器,并且被服务以构建报头包 在事务缓冲区中标识IOC事务,包括源和目标逻辑设备。 当每个数据包完成后,负责的I / O处理器将一个指针放入事务队列,这是一个FIFO寄存器。 每个IOC事务被启动,因为它的相应指针从事务队列弹出。 还公开了装置实施例。

    Architecture and instructions for accessing multi-dimensional formatted surface memory
    50.
    发明授权
    Architecture and instructions for accessing multi-dimensional formatted surface memory 有权
    用于访问多维格式化表面存储器的体系结构和指令

    公开(公告)号:US09519947B2

    公开(公告)日:2016-12-13

    申请号:US12890171

    申请日:2010-09-24

    IPC分类号: G06F12/00 G06T1/60

    CPC分类号: G06T1/60

    摘要: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.

    摘要翻译: 本发明的一个实施例提出了一种用于访问多维格式化图形表面存储器的程序的技术。 称为“表面”的多维存储器对象以用户指定的数据或像素格式存储并以图形优化的布局布置,由使用表面指令的程序访问。 可以使用一组存储器访问指令,例如加载,存储,减少和原子,称为表面指令,以访问表面。 通过可配置的夹紧进行坐标界限检查。 缓存行为也可以由表面指令指定。 支持存储,缩小和原子表面指令的数据格式转换和打包到指定的存储格式。 负载和原子表面指令支持从指定的存储格式进行数据格式转换和解包。