摘要:
A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation.
摘要:
A multithreaded rendering software pipeline architecture utilizes a rolling texture context data structure to store multiple texture contexts that are associated with different textures that are being processed in the software pipeline. Each texture context stores state data for a particular texture, and facilitates the access to texture data by multiple, parallel stages in a software pipeline. In addition, texture contexts are capable of being “rolled”, or copied to enable different stages of a rendering pipeline that require different state data for a particular texture to separately access the texture data independently from one another, and without the necessity for stalling the pipeline to ensure synchronization of shared texture data among the stages of the pipeline.
摘要:
Memory sharing in a software pipeline on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including segmenting a computer software application into stages of a software pipeline, the software pipeline comprising one or more paths of execution; allocating memory to be shared among at least two stages including creating a smart pointer, the smart pointer including data elements for determining when the shared memory can be deallocated; determining, in dependence upon the data elements for determining when the shared memory can be deallocated, that the shared memory can be deallocated; and deallocating the shared memory.
摘要:
A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.
摘要:
Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer.
摘要:
A method includes receiving a plurality of packets at an integrated processor block of a network on a chip device. The plurality of packets includes a first packet that includes an indication of a start of data associated with a pixel shader application. The method includes recovering the data from the plurality of packets. The method also includes storing the recovered data in a dedicated packet collection memory within the network on the chip device. The method further includes retaining the data stored in the dedicated packet collection memory during an interruption event. Upon completion of the interruption event, the method includes copying packets stored in the dedicated packet collection memory prior to the interruption event to an inbox of the network on the chip device for processing.
摘要:
Collecting and analyzing trace data while in a software debug mode through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling the collection of software debug information in a selected set of IP blocks distributed through the NOC, each IP block within the selected set of IP blocks having a set of trace data; collecting software debugging information via the set of trace data; communicating the set of trace data to a destination repository; and analyzing the set of trace data at the destination repository.
摘要:
A breakpoint packet is dispatched to a Network On A Chip (NOC). The breakpoint packet instructs one or more specified nodes on the NOC to place the specified nodes, or a core or hardware thread within a specified node, to execute in “single step” mode, in order to enable a debugging of a work packet that is dispatched to the specific node.
摘要:
A circuit arrangement, program product and circuit arrangement utilize a textured bounding volume to reduce the overhead associated with generating and using an Accelerated Data Structure (ADS) in connection with physical rendering. In particular, a subset of the primitives in a scene may be mapped to surfaces of a bounding volume to generate textures on such surfaces that can be used during physical rendering. By doing so, the primitives that are mapped to the bounding volume surfaces may be omitted from the ADS to reduce the processing overhead associated with both generating the ADS and using the ADS during physical rendering, and furthermore, in many instances the size of the ADS may be reduced, thus reducing the memory footprint of the ADS, and often improving cache hit rates and reducing memory bandwidth.
摘要:
A hardware thread is selectively forced to single step the execution of software instructions from a work packet granule. A “single step” packet is associated with a work packet granule. The work packet granule, with the associated “single step” packet, is dispatched as an appended work packet granule to a preselected hardware thread in a processor core, which, in one embodiment, is located at a node in a Network On a Chip (NOC). The work packet granule then executes in a single step mode until completion.