METHOD AND APPARATUS FOR INCREASING EFFICIENCY OF TRANSMISSION AND/OR STORAGE OF RAYS FOR PARALLELIZED RAY INTERSECTION TESTING
    31.
    发明申请
    METHOD AND APPARATUS FOR INCREASING EFFICIENCY OF TRANSMISSION AND/OR STORAGE OF RAYS FOR PARALLELIZED RAY INTERSECTION TESTING 有权
    用于提高传输和/或存储的并行RAI接口测试效率的方法和装置

    公开(公告)号:US20090096788A1

    公开(公告)日:2009-04-16

    申请号:US11871758

    申请日:2007-10-12

    IPC分类号: G06T15/50

    CPC分类号: G06T15/06

    摘要: For ray tracing, methods, apparatus, and computer readable media provide efficient transmission and/or storage of rays between ray emitters, and an intersection testing resource. Ray emitters, during emission of a plurality of rays, identify a shared attribute of each ray of the plurality, and represent that attribute as shared ray data. The shared ray data, and other ray data sufficient to determine both an origin and a direction for each ray of the plurality, are transmitted. Functionality in the intersection testing resource receives the shared ray data and the other ray data, and interprets the shared ray data and the other ray data to determine an origin and direction for each ray of the plurality, and provides those rays for intersection testing. Rays can be stored in the shared attribute format in the intersection testing resource and data elements representing the rays can be constructed later. Programmable receiving functionality of the intersection testing resource can accommodate many ray types and other situations.

    摘要翻译: 对于光线跟踪,方法,装置和计算机可读介质提供射线发射器之间的射线和交叉点测试资源的有效传输和/或存储。 在多个光线的发射期间,射线发射器识别多个射线的共享属性,并将该属性表示为共享射线数据。 发送共享射线数据和足以确定多个射线中的每个射线的原点和方向的其他射线数据。 交叉点测试资源中的功能接收共享射线数据和其他射线数据,并解释共享射线数据和其他射线数据,以确定多个射线的原点和方向,并提供这些射线进行交叉测试。 可以在交叉点测试资源中以共享属性格式存储光线,稍后可以构建表示光线的数据元素。 交叉点测试资源的可编程接收功能可以适应许多射线类型和其他情况。

    Apparatus and method for ray tracing with block floating point data
    32.
    发明授权
    Apparatus and method for ray tracing with block floating point data 有权
    用于具有块浮点数据的光线跟踪的装置和方法

    公开(公告)号:US08217935B2

    公开(公告)日:2012-07-10

    申请号:US12059559

    申请日:2008-03-31

    IPC分类号: G06T15/30

    CPC分类号: G06T15/06

    摘要: Systems and methods include high throughput and/or parallelized ray/geometric shape intersection testing using intersection testing resources accepting and operating with block floating point data. Block floating point data sacrifices precision of scene location in ways that maintain precision where more beneficial, and allow reduced precision where beneficial. In particular, rays, acceleration structures, and primitives can be represented in a variety of block floating point formats, such that storage requirements for storing such data can be reduced. Hardware accelerated intersection testing can be provided with reduced sized math units, with reduced routing requirements. A driver for hardware accelerators can maintain full-precision versions of rays and primitives to allow reduced communication requirements for high throughput intersection testing in loosely coupled systems. Embodiments also can include using BFP formatted data in programmable test cells or more general purpose processing elements.

    摘要翻译: 系统和方法包括使用交叉点测试资源接受和操作块浮点数据的高吞吐量和/或并行化的射线/几何形状的交点测试。 块浮点数据牺牲场景位置的精度,保持精度更有利,并有利于降低精度。 特别地,可以以各种块浮点格式来表示光线,加速度结构和原语,从而可以减少用于存储这种数据的存储要求。 硬件加速交叉测试可以提供减小尺寸的数学单位,并减少路由要求。 用于硬件加速器的驱动程序可以保持光线和图元的全精度版本,以便在松散耦合的系统中实现高吞吐量交叉测试的通信要求降低。 实施例还可以包括在可编程测试单元或更多通用处理元件中使用BFP格式的数据。

    APPARATUS AND METHOD FOR RAY TRACING WITH BLOCK FLOATING POINT DATA
    33.
    发明申请
    APPARATUS AND METHOD FOR RAY TRACING WITH BLOCK FLOATING POINT DATA 有权
    利用块浮点数据进行跟踪跟踪的装置和方法

    公开(公告)号:US20090244058A1

    公开(公告)日:2009-10-01

    申请号:US12059559

    申请日:2008-03-31

    IPC分类号: G06F17/00

    CPC分类号: G06T15/06

    摘要: Systems and methods include high throughput and/or parallelized ray/geometric shape intersection testing using intersection testing resources accepting and operating with block floating point data. Block floating point data sacrifices precision of scene location in ways that maintain precision where more beneficial, and allow reduced precision where beneficial. In particular, rays, acceleration structures, and primitives can be represented in a variety of block floating point formats, such that storage requirements for storing such data can be reduced. Hardware accelerated intersection testing can be provided with reduced sized math units, with reduced routing requirements. A driver for hardware accelerators can maintain full-precision versions of rays and primitives to allow reduced communication requirements for high throughput intersection testing in loosely coupled systems. Embodiments also can include using BFP formatted data in programmable test cells or more general purpose processing elements.

    摘要翻译: 系统和方法包括使用交叉点测试资源接受和操作块浮点数据的高吞吐量和/或并行化的射线/几何形状的交点测试。 块浮点数据牺牲场景位置的精度,保持精度更有利,并有利于降低精度。 特别地,可以以各种块浮点格式来表示光线,加速度结构和原语,从而可以减少用于存储这种数据的存储要求。 硬件加速交叉测试可以提供减小尺寸的数学单位,并减少路由要求。 用于硬件加速器的驱动程序可以保持光线和图元的全精度版本,以便在松散耦合的系统中实现高吞吐量交叉测试的通信要求降低。 实施例还可以包括在可编程测试单元或更多通用处理元件中使用BFP格式的数据。

    Multistage collector for outputs in multiprocessor systems
    34.
    发明授权
    Multistage collector for outputs in multiprocessor systems 有权
    多处理器系统中的输出多级收集器

    公开(公告)号:US09595074B2

    公开(公告)日:2017-03-14

    申请号:US13611325

    申请日:2012-09-12

    IPC分类号: G06F15/80 G06T1/20 G06T15/06

    摘要: Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements.

    摘要翻译: 方面包括用于接收来自多个处理元件的输出的多级收集器。 处理元件可以包括(每个或集体)多个聚类,其中一个或多个ALU可以对数据向量执行SIMD操作,并根据用于配置ALU的指令流产生输出。 多级收集器包括各自具有至少一个输入队列,存储器,打包单元和输出队列的取代组件; 这些组件的大小可以处理给定大小的输入元素组,并且可以具有多个输入队列和单个输出队列。 一些组件耦合以接收来自ALU和其他组件的输出,从其他组件接收输出。 最终,多级收集器可以输出输入元素的分组。 可以基于索引元素的匹配来形成每个元素组(例如,在输入队列处或存储在组件的存储器中)。

    Memory allocation in distributed memories for multiprocessing
    35.
    发明授权
    Memory allocation in distributed memories for multiprocessing 有权
    用于多处理的分布式存储器中的内存分配

    公开(公告)号:US09478062B2

    公开(公告)日:2016-10-25

    申请号:US13368616

    申请日:2012-02-08

    IPC分类号: G06T15/06 G06T15/00

    摘要: In some aspects, finer grained parallelism is achieved by segmenting programmatic workloads into smaller discretized portions, where a first element can be indicative both of a configuration or program to be executed, and a first data set to be used in such execution, while a second element can be indicative of a second data element or group. The discretized portions can cause program execute on distributed processors. Approaches to selecting processors, and allocating local memory associated with those processors are disclosed. In one example, discretized portions that share a program have an anti-affinity to cause dispersion, for initial execution assignment. Flags, such as programmer and compiler generated flags can be used in determining such allocations. Workloads can be grouped according to compatibility of memory usage requirements.

    摘要翻译: 在一些方面,通过将编程工作负载分割成更小的离散部分来实现更精细的粒度并行性,其中第一元素可指示要执行的配置或程序以及要在其中执行的第一数据集, 元素可以指示第二数据元素或组。 离散化部分可以使分布式处理器上的程序执行。 公开了选择处理器以及分配与这些处理器相关联的本地存储器的方法。 在一个示例中,共享程序的离散化部分具有反关联性以引起分散,用于初始执行分配。 诸如编程器和编译器生成的标志之类的标志可用于确定这样的分配。 可以根据内存使用要求的兼容性对工作负载进行分组。

    MULTISTAGE COLLECTOR FOR OUTPUTS IN MULTIPROCESSOR SYSTEMS
    36.
    发明申请
    MULTISTAGE COLLECTOR FOR OUTPUTS IN MULTIPROCESSOR SYSTEMS 有权
    多处理器系统中的输出多级收集器

    公开(公告)号:US20130069960A1

    公开(公告)日:2013-03-21

    申请号:US13611325

    申请日:2012-09-12

    IPC分类号: G06T1/20

    摘要: Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements.

    摘要翻译: 方面包括用于接收来自多个处理元件的输出的多级收集器。 处理元件可以包括(每个或集体)多个聚类,其中一个或多个ALU可以对数据向量执行SIMD操作,并根据用于配置ALU的指令流产生输出。 多级收集器包括各自具有至少一个输入队列,存储器,打包单元和输出队列的取代组件; 这些组件的大小可以处理给定大小的输入元素组,并且可以具有多个输入队列和单个输出队列。 一些组件耦合以接收来自ALU和其他组件的输出,从其他组件接收输出。 最终,多级收集器可以输出输入元素的分组。 可以基于索引元素的匹配来形成每个元素组(例如,在输入队列处或存储在组件的存储器中)。

    SYSTEMS AND METHODS FOR PHOTON MAP QUERYING
    37.
    发明申请
    SYSTEMS AND METHODS FOR PHOTON MAP QUERYING 有权
    光电地图查询的系统和方法

    公开(公告)号:US20100332523A1

    公开(公告)日:2010-12-30

    申请号:US12825728

    申请日:2010-06-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30533 G06F17/30592

    摘要: In one aspect, photon queries are answered using systems and methods of traversal of collections of photon queries through an acceleration structure, to identify photons meeting a specification of a given query. Such systems and methods can be extended to satisfying similarity queries in an n-dimensional parameter space. Queries can be associated with code (or pointers to code) that are run to achieve closure of that query. Queries can cause further queries to be emitted. Arbitrary data can be passed from one query to another; for example, parameters defined internally to the code modules themselves (e.g., the parameters do not need to have a definition or meaning to the systems or within the methods).

    摘要翻译: 在一个方面,使用通过加速结构遍历光子查询的集合的系统和方法回答光子查询,以识别符合给定查询的规范的光子。 这样的系统和方法可以扩展到在n维参数空间中满足相似性查询。 查询可以与运行以实现该查询关闭的代码(或代码指针)相关联。 查询可能会导致进一步的查询被排除。 任意数据可以从一个查询传递给另一个查询; 例如,在代码模块本身内部定义的参数(例如,参数不需要具有对系统的定义或含义或在方法内)。

    Systems and methods for 3-D scene acceleration structure creation and updating
    38.
    发明授权
    Systems and methods for 3-D scene acceleration structure creation and updating 有权
    3-D场景加速结构创建和更新的系统和方法

    公开(公告)号:US08717357B2

    公开(公告)日:2014-05-06

    申请号:US13567033

    申请日:2012-08-04

    IPC分类号: G06T15/08

    摘要: Systems and methods for producing an acceleration structure provide for subdividing a 3-D scene into a plurality of volumetric portions, which have different sizes, each being addressable using a multipart address indicating a location and a relative size of each volumetric portion. A stream of primitives is processed by characterizing each according to one or more criteria, selecting a relative size of volumetric portions for use in bounding the primitive, and finding a set of volumetric portions of that relative size which bound the primitive. A primitive ID is stored in each location of a cache associated with each volumetric portion of the set of volumetric portions. A cache location is selected for eviction, responsive to each cache eviction decision made during the processing. An element of an acceleration structure according to the contents of the evicted cache location is generated, responsive to the evicted cache location.

    摘要翻译: 用于产生加速结构的系统和方法提供将3-D场景细分为具有不同大小的多个体积部分,每个体积部分可使用指示每个体积部分的位置和相对大小的多部分地址来寻址。 通过根据一个或多个标准表征每个图元来处理图元流,选择用于界定图元的体积部分的相对大小,以及找到结合原始图像的相对尺寸的一组体积部分。 原始ID存储在与该组体积部分的每个体积部分相关联的高速缓存的每个位置中。 响应于在处理期间进行的每个缓存驱逐决定,选择缓存位置进行驱逐。 响应于被驱逐的高速缓存位置,生成根据驱逐的高速缓存位置的内容的加速结构的元素。

    Graphics processor with non-blocking concurrent architecture
    39.
    发明授权
    Graphics processor with non-blocking concurrent architecture 有权
    具有非阻塞并发架构的图形处理器

    公开(公告)号:US08692834B2

    公开(公告)日:2014-04-08

    申请号:US13567091

    申请日:2012-08-06

    IPC分类号: G06F15/80 G06F15/00 G06T1/00

    摘要: In some aspects, systems and methods provide for forming groupings of a plurality of independently-specified computation workloads, such as graphics processing workloads, and in a specific example, ray tracing workloads. The workloads include a scheduling key, which is one basis on which the groupings can be formed. Workloads grouped together can all execute from the same source of instructions, one or more different private data elements. Such workloads can recursively instantiate other workloads that reference the same private data elements. In some examples, the scheduling key can be used to identify a data element to be used by all the workloads of a grouping. Memory conflicts to private data elements are handled through scheduling of non-conflicted workloads or specific instructions an deferring conflicted workloads instead of locking memory locations.

    摘要翻译: 在一些方面,系统和方法提供用于形成多个独立指定的计算工作负荷(诸如图形处理工作负载)以及在具体示例中的光线跟踪工作负载的分组。 工作负载包括一个调度密钥,这是可以形成分组的一个基础。 分组在一起的工作负载都可以从相同的指令来源执行,一个或多个不同的私有数据元素。 这样的工作负载可以递归地实例化引用相同私有数据元素的其他工作负载。 在一些示例中,调度密钥可用于标识要由分组的所有工作负载使用的数据元素。 与私有数据元素的内存冲突通过调度非冲突的工作负载或特定指令来处理,推迟冲突的工作负载而不是锁定内存位置。

    SCHEDULING HETEROGENOUS COMPUTATION ON MULTITHREADED PROCESSORS
    40.
    发明申请
    SCHEDULING HETEROGENOUS COMPUTATION ON MULTITHREADED PROCESSORS 审中-公开
    在多处理器上调度异构计算

    公开(公告)号:US20120324458A1

    公开(公告)日:2012-12-20

    申请号:US13368682

    申请日:2012-02-08

    IPC分类号: G06F9/46

    摘要: Aspects include computation systems that can identify computation instances that are not capable of being reentrant, or are not reentrant capable on a target architecture, or are non-reentrant as a result of having a memory conflict in a particular execution situation. A system can have a plurality of computation units, each with an independently schedulable SIMD vector. Computation instances can be defined by a program module, and a data element(s) that may be stored in a local cache for a particular computation unit. Each local cache does not maintain coherency controls for such data elements. During scheduling, a scheduler can maintain a list of running (or runnable) instances, and attempt to schedule new computation instances by determining whether any new computation instance conflicts with a running instance and responsively defer scheduling. Memory conflict checks can be conditioned on a flag or other indication of the potential for non-reentrancy.

    摘要翻译: 方面包括计算系统,其可以识别不能够在目标架构上能够重入或不可重入的计算实例,或者由于在特定执行情况下存在内存冲突而不能重入。 系统可以具有多个计算单元,每个具有可独立调度的SIMD向量。 计算实例可以由程序模块和可存储在特定计算单元的本地高速缓存中的数据元素来定义。 每个本地缓存不保持这些数据元素的一致性控制。 在调度期间,调度程序可以维护运行(或可运行)实例的列表,并尝试通过确定任何新的计算实例是否与正在运行的实例冲突并响应延迟调度来调度新的计算实例。 内存冲突检查可以根据标志或其他指示来进行非重新注册。