摘要:
A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.
摘要:
A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.
摘要:
A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.
摘要:
Primitives are divided into span groups of 2N spans, and then processed in M×N blocks of pixels, with the pixel blocks preferably being as close to square as possible and therefore optimized for small spans and texture mapping. Each span group is rendered block-by-block in a serpentine manner from an initial or entry block, first in a direction away from the long edge of the primitive and then in a direction towards the long edge. The interpolators include a one-deep stack onto which pixel and texel information for the initial or entry block are pushed before rendering any other blocks within the span group. Blocks or pairs of blocks within different span subgroups of the span group are then alternately rendered, such that rendering zig-zags between the span subgroups as it proceeds to the end of the span group. Once the first end of a span group is reached, the values for the initial or entry block are popped from the stack and rendering resumes from the initial or entry block in the opposite direction, but in the same serpentine or zig-zag manner, until the other end of the span group is reached. The next span group, if any, is rendered starting with a block adjacent to the last block rendered in the previous span group. Memory bandwidth utilization between the pixel and texel cache and the frame buffer is thus improved, together with texel reuse during texture mapping, to reduce the total number of pixel and texel fetches required to render the primitive.
摘要:
Graphics information is efficiently transferred from a host computer to a graphics subsystem in which rendering and pixel data is generated by the host system. A masked span operation provides an assist for 3D rendering performed by the system processor of the host and other system resources. Storage of depth, alpha, stencil, and other pixel data is in system memory including one or more ancillary graphics buffers. The main processor of the host system generates pixel data associated with an image. This data is checked against the buffers. As a result of such checking, a mask is generated by the host system. The mask is transferred in burst mode across the host-graphic subsystem PCI bus to the graphics subsystem in combination with span width, and in the case of interpolated color, color base and color increment data, and X,Y coordinate of the first pixel. In the graphics subsystem the mask is employed with the other data to load the frame buffer with the portion of pixel data defined by the mask.
摘要:
Moving a pointer in a graphical user interface environment is provided. An input comprising an initial delta value determined by a device driver is received from the device driver. The initial delta value is located in a data structure. A new delta value associated with the initial delta value is selected from the data structure. A new position of a pointer in the graphical user interface environment is calculated based on the new delta value. The new position of the pointer is sent to the graphical user interface environment for rendering.
摘要:
A pixel merge apparatus and method has been implemented. Included is a configurable graphics device, which may serve as a standalone graphics engine, or as a master or slave in a master/slave configuration. In stand alone mode, the mechanism drives a display device with native pixel data. A device configured in master mode is operable for receiving pixel data from a corresponding slave device, and merging the slave pixel data with native pixel data generated by a rasterizer within the ASIC. Data is communicated between slave and master using a digital data link which may also serve to drive a flat panel display in standalone mode. A FIFO, active in the master, mediates the transfer of the slave pixel data and permits switching between native and slave pixel data with signal pixel resolution. Pixel data may be merged on a frame-by-frame basis, or in split frame mode wherein a first portion of the graphic shown on a display device constitutes native pixels generated in the rasterizer corresponding to the master device, and a second portion of the displayed graphic includes pixels generated by the rasterizer in the slave device.
摘要:
Moving a pointer in a graphical user interface environment is provided. An input comprising an initial delta value determined by a device driver is received from the device driver. The initial delta value is located in a data structure. A new delta value associated with the initial delta value is selected from the data structure. A new position of a pointer in the graphical user interface environment is calculated based on the new delta value. The new position of the pointer is sent to the graphical user interface environment for rendering.
摘要:
A system and method for identifying and manipulating logic analyzer data from multiple clock domains is presented. A logic analyzer receives debug data and determines whether the debug data is a full frequency data type, a half frequency data type, or a crossed data type. Once determined, the logic analyzer reconstructs the debug data such that debug condition-matching logic may process the reconstructed data in a full frequency domain. For half frequency data types, the logic analyzer adds masked data values to the data in order to reconstruct the data into to the full frequency domain before processing the data. For crossed data types, the logic analyzer reconstructs the data into its original format before processing the data in a full frequency domain.
摘要:
A system and method for identifying and manipulating logic analyzer data from multiple clock domains is presented. A logic analyzer receives debug data and determines whether the debug data is a full frequency data type, a half frequency data type, or a crossed data type. Once determined, the logic analyzer reconstructs the debug data such that debug condition-matching logic may process the reconstructed data in a full frequency domain. For half frequency data types, the logic analyzer adds masked data values to the data in order to reconstruct the data into to the full frequency domain before processing the data. For crossed data types, the logic analyzer reconstructs the data into its original format before processing the data in a full frequency domain.