摘要:
A system and method for rasterizing and rendering graphics data is disclosed. Vertices may be grouped to form primitives such as triangles, which are rasterized using two-dimensional arrays of samples bins. To overcome fragmentation problems, the system's sample evaluation hardware may be configured to over-evaluate samples each clock cycle. Since a number of the samples will typically not survive evaluation because they will be outside the primitive being rendered, the remaining surviving samples may be combined into sets, with one set being forwarded to subsequent pipeline stages each clock cycle in order to attempt to keep the pipeline utilization high.
摘要:
A graphics system may include a frame buffer that includes several sets of one or more memory banks and a cache. The frame buffer may load data from one of the memory banks into the cache in response to receiving a cache fill request. Each set of memory banks is accessible independently of each other set of memory banks. A frame buffer interface coupled to the frame buffer includes a plurality of cache fill request queues. Each cache fill request queue is configured to store one or more cache fill requests targeting a corresponding one of the sets of memory banks. The frame buffer interface is configured to select a cache fill request from one of the cache fill request queues that stores cache fill requests targeting a set of memory banks that is not currently being accessed and to provide the selected cache fill request to the frame buffer.
摘要:
A memory interface controls read and write accesses to a memory device. The memory device includes a level-one cache, level-two cache and storage cell array. The memory interface includes a data request processor (DRP), a memory control processor (MCP) and a block cleansing unit (BCU). The MCP controls transfers between the storage cell array, the level-two cache and the level-one cache. In response to a read request with associated read clear indication, the DRP controls a read from a level-one cache block, updates bits in a corresponding dirty tag, and sets a mode indicator of the dirty tag to a the read clear mode. The modified dirty tag bits and mode indicator are signals to the BCU that the level-one cache block requires a source clear operation. The BCU commands the transfer of data from a color fill block in the level-one cache to the level-two cache.
摘要:
A method and a system for stalling large pipelined designs. A computational pipeline may comprise a first module and a second module coupled together. The first module may propagate one or more signals to the second module. A stall-signal may be asserted in order to stall the computational pipeline if the second module is not ready to receive the one or more signals from the first module. The one or more signals propagated from the first module and the asserted stall-signal may be buffered in a stall-buffer. The asserted stall-signal may be propagated to the first module in a next cycle. The first module may be stalled in response to the first module receiving the propagated asserted stall-signal. Next, the asserted stall-signal may be propagated up the computational pipeline.
摘要:
A system and method for reading register contents from a computational pipeline having a plurality of computational units. The system includes a readback bus and a read control unit. The readback bus has a plurality of logic units coupled in a series. Each logic unit couples to a corresponding one of the computational units. The read control unit couples to each of the computational units through a corresponding load line, and is configured to assert a load signal on one of the load lines in response to a register read request. Each of the computational units is configured to transmit a data value from a selected register onto the readback bus in response to detecting an assertion of the load signal on its corresponding load line.
摘要:
An integrated circuit may include several components, one or more interfaces, an interconnect (e.g., a bus), and a controller. The components may each be configured to assert a read request to read data stored externally to the integrated circuit. The interfaces may be configured to output the read request asserted by one of the components and to receive data in response to outputting the request. The interconnect may be coupled to perform one or more data transactions to transmit the data from one of the interfaces to one or more of the components. In response to the read request asserted by one of the components, the controller may inhibit performance of a read transaction initiated by the read request dependent upon a comparison of a total number of outstanding data transactions to a maximum allowable number of outstanding data transactions.
摘要:
A multi-chip system and method are disclosed for incorporating a primitive assembler in each of one or more geometry chips and one or more rasterization chips. This system may allow per-primitive operations to be performed in the geometry chips, and also allow use of a vertex data interface for sending vertex data to the rasterization chips. The primitive assemblers in the geometry chips may assemble vertices into primitives for clipping tests. The geometry chips may also test an assembled primitive against the projected boundaries of a set of screen space regions, where each region is assigned to one of the rasterization chips. Those primitives residing in more than one region may be sub-divided into two or more new primitives so that each new primitive resides in only one screen space region. The geometry chip may then send the vertex data for each primitive to the corresponding rasterization chip.
摘要:
An integrated circuit including logic for testing internal operation of the integrated circuit. The integrated circuit may comprise a plurality of internal functional blocks coupled by a plurality of internal buses. The integrated circuit may also comprise a set of test control input pins and a set of test output pins comprised on the integrated circuit. The integrated circuit may comprise selection logic. The selection logic comprises inputs coupled to various ones of the internal buses, an output coupled to the set of test output pins, and a select input coupled to receive select signals from the set of test control input pins. The selection logic is operable to select internal bus signals from an internal bus based on the select signals from the test control input pins, and the selection logic is configured to output the selected internal bus signals to the set of test output pins. The integrated circuit thus allows multiplexing of different critical internal buses so that the signals on the critical buses may be output for observation via selected test pins on the integrated circuit. The observability logic may be configured to switch slowly relative to the internal busses, and the generation of the observability logic and testing may be automated.
摘要:
A method for loading and executing an indeterminate length shader program. The method includes accessing a first portion of a shader program in graphics memory of a GPU and loading instructions from the first portion into a plurality of stages of the GPU to configure the GPU for program execution. A group of pixels is then processed in accordance with the instructions from the first portion. A second portion of the shader program is accessed in graphics memory of the GPU and instructions from the second portion are loaded into the plurality of stages of the GPU to configure the GPU for program execution. The group of pixels are then processed in accordance with the instructions from the second portion.
摘要:
A multi-chip system and method are disclosed that utilizes a plurality of graphics pipelines to perform large kernel convolution. Each graphics pipeline includes a standard rendering unit and a video data convolve unit. Each video data convolve unit receives video pixel data from the video output of the standard rendering unit. The video data convolve units are connected in a chain. Each group of one or more video data convolve units in the chain convolves the video pixel data received by the group. The last video data convolve unit in the chain outputs a stream of convolved pixels.