Abstract:
This disclosure describes techniques for automatically selecting a rendering mode for use by a graphics processing unit (GPU) to render graphics data for display. More specifically, the techniques include evaluating at least two metrics associated with rendering graphics data of one or more rendering units, and automatically selecting either an immediate rendering mode or a deferred rendering mode for a current rendering unit based on the evaluated metrics. The selected rendering mode may be the one of the rendering modes predicted to use less power and/or system bandwidth to render the graphics data of the current rendering unit. A rendering unit may comprise a set of frames, a frame, a portion of a frame, multiple render targets associated with a frame, a single render target associated with a frame, or a portion of a single render target.
Abstract:
A method and system for providing surface texture in a graphics image rendered by a graphics processing system. Color values of a pixel having a normal vector normal to a surface in which the pixel is located are calculated based on a perturbed normal vector. The perturbed normal vector is displaced from the normal vector by a displacement equal to the sum of a first vector tangent to the surface at the location of the pixel scaled by a first scale factor and a first displacement value, and a second vector tangent to the surface at the location of the pixel and scaled by a second scale factor and a second displacement value, the second vector perpendicular to the first vector. The displacement values are representative of partial derivatives of a function defining a texture applied to the surface and the scale factors are used to scale the magnitude of the resulting perturbed normal. The color value for the pixel being rendered will be based on the perturbed normal vector instead of the normal vector.
Abstract:
This disclosure describes techniques for automatically selecting a rendering mode for use by a graphics processing unit (GPU) to render graphics data for display. More specifically, the techniques include evaluating at least two metrics associated with rendering graphics data of one or more rendering units, and automatically selecting either an immediate rendering mode or a deferred rendering mode for a current rendering unit based on the evaluated metrics. The selected rendering mode may be the one of the rendering modes predicted to use less power and/or system bandwidth to render the graphics data of the current rendering unit. A rendering unit may comprise a set of frames, a frame, a portion of a frame, multiple render targets associated with a frame, a single render target associated with a frame, or a portion of a single render target.
Abstract:
In general, aspects of this disclosure describe example techniques for efficient storage of data of various data types for graphics processing. In some examples, a processing unit may assign first and second contiguous range of addresses for a first and second data type, respectively. The processing unit may store at least one of graphics data of the first or second data type or addresses of the graphics data of the first or second data type within blocks whose addresses are within the first and second contiguous range of addresses, respectively. The processing unit may store, in cache lines of a cache, the graphics data of the first data type, and the graphics data of the second data type.
Abstract:
A TCP/IP offload network interface device (NID) receives packets from a plurality of clients and generates, from the socket address of each such packet, a hash value. Each hash value identifies one of a plurality of hash buckets maintained on the NID. In a file server, certain socket address bits of the packets are low entropy bits in that they tend to be the same, regardless of which client sent the packet. Others of the socket address bits are high entropy bits. The hash function employed is such that the hash values resulting from the changing values of the high entropy bits are substantially evenly distributed among the plurality of hash buckets. In a fast-path, the NID uses a first hash function to identify TCBs on the NID. In a slow-path, the NID generates a second hash using a second hash function and a host stack uses the second hash.
Abstract:
This disclosure describes techniques for extending the architecture of a general purpose graphics processing unit (GPGPU) with parallel processing units to allow efficient processing of pipeline-based applications. The techniques include configuring local memory buffers connected to parallel processing units operating as stages of a processing pipeline to hold data for transfer between the parallel processing units. The local memory buffers allow on-chip, low-power, direct data transfer between the parallel processing units. The local memory buffers may include hardware-based data flow control mechanisms to enable transfer of data between the parallel processing units. In this way, data may be passed directly from one parallel processing unit to the next parallel processing unit in the processing pipeline via the local memory buffers, in effect transforming the parallel processing units into a series of pipeline stages.
Abstract:
In general, aspects of this disclosure describe example techniques for efficient storage of data of various data types for graphics processing. In some examples, a processing unit may assign first and second contiguous range of addresses for a first and second data type, respectively. The processing unit may store at least one of graphics data of the first or second data type or addresses of the graphics data of the first or second data type within blocks whose addresses are within the first and second contiguous range of addresses, respectively. The processing unit may store, in cache lines of a cache, the graphics data of the first data type, and the graphics data of the second data type.
Abstract:
This disclosure describes techniques for extending the architecture of a general purpose graphics processing unit (GPGPU) with parallel processing units to allow efficient processing of pipeline-based applications. The techniques include configuring local memory buffers connected to parallel processing units operating as stages of a processing pipeline to hold data for transfer between the parallel processing units. The local memory buffers allow on-chip, low-power, direct data transfer between the parallel processing units. The local memory buffers may include hardware-based data flow control mechanisms to enable transfer of data between the parallel processing units. In this way, data may be passed directly from one parallel processing unit to the next parallel processing unit in the processing pipeline via the local memory buffers, in effect transforming the parallel processing units into a series of pipeline stages.
Abstract:
A 10 Gb/s network interface device offloads TCP/IP datapath functions. Frames without IP datagrams are processed as with a non-offload NIC. Receive frames are filtered, then transferred to preallocated receive buffers within host memory. Outbound frames are retrieved from host memory, then transmitted. Frames with IP datagrams without TCP segments are transmitted without any protocol offload, but received frames are parsed and checked for protocol errors, including checksum accumulation for UDP segments. Receive frames without datagram errors are passed to the host and error frames are dumped. Frames with Tcp segments are parsed and error-checked. Hardware checking is performed for ownership of the socket state. TCP/IP frames which fail the ownership test are passed to the host system with a parsing summary. TCP/IP frames which pass the ownership test are processed by a finite state machine implemented by the CPU. TCP/IP frames for non-owned sockets are supported with checksum accumulation/insertion.