摘要:
A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.
摘要:
A circuit is disclosed that uses a four element dot product circuit (DP4) to approximate an argument t=x/pi for an input x. The argument is then input to a trigonometric function such as Sin Pi( ) or Cos Pi( ). The DP4 circuit calculates x times a representation of the reciprocal of pi. The bits of the reciprocal of pi that are used are selected based on the magnitude of the exponent of x. The DP4 circuit includes four multipliers, two intermediate adders, and a final adder. The outputs of the multipliers, intermediate adders, and final adder are adjusted such that the output of the final adder is a value of the argument t that will provide an accurate output when input to the trigonometric function.
摘要:
An apparatus and method for rasterizing a primitive in a graphics system is disclosed in one example of the invention as including scanning a first row of tiles, one tile at a time, starting from a first point and scanning in a first direction. Immediately after scanning the first row of tiles, the method includes moving from the first point to a second point in an orthogonal direction relative to the first row. Immediately after moving from the first point to the second point, the method includes scanning a second row of tiles, one tile at a time, starting from the second point and scanning in the first direction. By scanning rows in the same direction immediately prior to and after moving from one row to another, cache utilization is improved.
摘要:
A system to apply a smoothing filter during anti-aliasing at a post-rendering stage. An embodiment of the system includes a three-dimensional renderer, an edge detector, and a smoothing filter. The three-dimensional renderer is configured to render a three-dimensional scene. The edge detector is coupled to the three-dimensional renderer. The edge detector is configured to read values of a depth buffer and to apply edge detection criteria to the values of the depth buffer in order to detect an object edge within the three -dimensional scene. The smoothing filter coupled to the edge detector. The smoothing filter is configured to read values of a color buffer and to apply a smoothing coefficient to the values of the color buffer. The values of the color buffer include a pixel sample at the detected object edge.
摘要:
An apparatus and method is provided for data processing where power is automatically controlled with a feed back loop with the host processor based on the internal work load characterized by performance counters. The host automatically adjusts internal frequencies or voltage level to match the work load. The feedback loop allows tuning of frequency or voltage controlling power dissipation.
摘要:
An apparatus and method for detecting and handling thin lines in a raster image includes reading depth values for each pixel of an n×m block of pixels surrounding a substantially central pixel. Differences are then calculated for selected depth values of the n×m block of pixels to yield multiple difference values. These difference values may then be compared with multiple pre-computed difference values associated with thin lines pre-determined to pass through the n×m block of pixels. If the difference values of the pixel block substantially match the difference values of one of the pre-determined thin lines, the pixel block may be deemed to describe a thin line. The apparatus and method may preclude application of an anti-aliasing filter to the substantially central pixel of the pixel block in the event it describes a thin line.
摘要:
A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.
摘要:
A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).
摘要:
Systems and method for tile-based compression are disclosed. Image data, such as a frame, may be divided into tiles. The tiles may be sized based on a size of a line buffer. Tiles are compressed and decompressed individually. As portions of the image frame are updated, corresponding updated tiles may be compressed and stored. Likewise, as tiles are accessed they may be de-compressed and streamed to a requesting device. In some embodiments, a decoder operable to decompress tiles may be interposed between a memory device and a requesting device. Data encoding one or more compressed tiles may be grouped to enable decompression at a rate of four pixels per clock cycle. Methods for compressing image data including both RGB and RGBα components are disclosed.
摘要:
Disclosed are new approaches to Multi-dimensional filtering with a reduced number of memory reads and writes. In one embodiment, a filter includes first and second coefficients. A block of a data having width and height each equal to the number of one of the first or second coefficients is read from a memory device. Arrays of values from the block are filtering using the first filter coefficients and the results filtered using the second coefficients. The final result may be optionally blended with another data value and written to a memory device. Registers store results of filtering with the first coefficients. The block of data may be read from a location including a source coordinate. The final result of filtering may be written to a destination coordinate obtained by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the first coefficients varies according to a rotation mode.