摘要:
A method includes receiving a plurality of packets at an integrated processor block of a network on a chip device. The plurality of packets includes a first packet that includes an indication of a start of data associated with a pixel shader application. The method includes recovering the data from the plurality of packets. The method also includes storing the recovered data in a dedicated packet collection memory within the network on the chip device. The method further includes retaining the data stored in the dedicated packet collection memory during an interruption event. Upon completion of the interruption event, the method includes copying packets stored in the dedicated packet collection memory prior to the interruption event to an inbox of the network on the chip device for processing.
摘要:
Collecting and analyzing trace data while in a software debug mode through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling the collection of software debug information in a selected set of IP blocks distributed through the NOC, each IP block within the selected set of IP blocks having a set of trace data; collecting software debugging information via the set of trace data; communicating the set of trace data to a destination repository; and analyzing the set of trace data at the destination repository.
摘要:
A breakpoint packet is dispatched to a Network On A Chip (NOC). The breakpoint packet instructs one or more specified nodes on the NOC to place the specified nodes, or a core or hardware thread within a specified node, to execute in “single step” mode, in order to enable a debugging of a work packet that is dispatched to the specific node.
摘要:
A circuit arrangement, program product and circuit arrangement utilize a textured bounding volume to reduce the overhead associated with generating and using an Accelerated Data Structure (ADS) in connection with physical rendering. In particular, a subset of the primitives in a scene may be mapped to surfaces of a bounding volume to generate textures on such surfaces that can be used during physical rendering. By doing so, the primitives that are mapped to the bounding volume surfaces may be omitted from the ADS to reduce the processing overhead associated with both generating the ADS and using the ADS during physical rendering, and furthermore, in many instances the size of the ADS may be reduced, thus reducing the memory footprint of the ADS, and often improving cache hit rates and reducing memory bandwidth.
摘要:
A hardware thread is selectively forced to single step the execution of software instructions from a work packet granule. A “single step” packet is associated with a work packet granule. The work packet granule, with the associated “single step” packet, is dispatched as an appended work packet granule to a preselected hardware thread in a processor core, which, in one embodiment, is located at a node in a Network On a Chip (NOC). The work packet granule then executes in a single step mode until completion.
摘要:
A method, system and computer program product for managing secondary rays during ray-tracing are presented. A non-visible unidirectional ray tracing object logically surrounds a user-selected virtual object in a computer generated illustration. This unidirectional ray tracing object prevents secondary tracing rays from emanating from the user-selected virtual object during ray tracing.
摘要:
Data processing on a network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, with each IP block also adapted to the network by a low latency, high bandwidth application messaging interconnect comprising an inbox and an outbox, each IP block also including a stack normally used for context switching, the stack access slower than the outbox access, and each IP block further including a processor supporting a plurality of threads of execution, the processor configured to save, upon a context switch, a context of a current thread of execution in memory locations in a memory array in the outbox instead of the stack and lock the memory locations in which the context was saved.
摘要:
Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.
摘要:
A multithreaded rendering software pipeline architecture dynamically reallocates regions of an image space to raster threads based upon performance data collected by the raster threads. The reallocation of the regions typically includes resizing the regions assigned to particular raster threads and/or reassigning regions to different raster threads to better balance the relative workloads of the raster threads.
摘要:
A circuit arrangement and method implement impulse propagation in a multithreaded physics engine by assigning ownership of objects in a scene to individual threads and propagating impulses between objects that are in contact with one another by passing inter-thread impulse messages between the threads that own the contacting objects, while locally propagating impulses through objects using the threads to which such objects are assigned.