摘要:
In contrast to a conventional computing system in which the graphics processor (graphics processing unit or GPU) is treated as a slave to one or several CPUs, systems and methods are provided that allow the GPU to be treated as a central processing unit (CPU) from the perspective of the operating system. The GPU can access a memory space shared by other CPUs in the computing system. Caches utilized by the GPU may be coherent with caches utilized by other CPUs in the computing system. The GPU may share execution of general-purpose computations with other CPUs in the computing system.
摘要:
A system and method for converting low-jitter, interleaved frame traffic, such as that generated in an IP network, to high jitter traffic to improve the utilization of bandwidth on arbitrated loops such as Fibre Channel Arbitrated Loops. Embodiments of a high jitter scheduling algorithm may be used in devices such as network switches that interface an arbitrated loop with an IP network that carries low-jitter traffic. The high jitter algorithm may use a separate queue for each device on the arbitrated loop, or alternatively may use one queue for two or more devices. Incoming frames are distributed among the queues based upon each frame's destination device. The scheduling algorithm may then service the queues and forward queued frames to the devices from the queues. In one embodiment, the queues are serviced in a round-robin fashion. In one embodiment, each queue may be serviced for a programmed limit.
摘要:
At least two different processing sections in a graphics processors compute Z coordinates for a sample location from a compressed Z representation. The processors are designed to ensure that Z coordinates computed in any unit in the processor are identical. In one embodiment, the respective arithmetic circuits included in each processing section that computes Z coordinates are “bit-identical,” meaning that, for any input planar Z representation and coordinates, the output Z coordinates produced by the circuits are identical to each other.
摘要:
A system and method for input thresholding packets in a network switch. A network switch may include multiple input ports, multiple output ports, and a shared random access memory coupled to the input ports and output ports by data transport logic. Packets entering the network switch may be assigned to one of a plurality of threshold groups and to one of a plurality of flows within the threshold group. In one embodiment, each threshold group may be divided into a plurality of levels of operation. As resources are allocated or freed by the threshold group, the threshold group may dynamically move up or down in the levels of operation. Within each level, one or more different values may be used as level boundaries and resource limits for flows within the threshold group. In one embodiment, programmable registers may be used to store these values.
摘要:
A system and method for enabling a network switch to transmit queued packets to a device when opened by the device, and thus to utilize the Fibre Channel Arbitrated Loop (FC-AL) in full-duplex mode when possible. The switch may include a plurality of queues each associated with a device on the FC-AL for queuing incoming packets for the device. The switch may determine a next non-empty queue, open the device associated with the queue, and send packets to the device. The device may send packets to the switch concurrently with receiving packets from the switch, thus utilizing the FC-AL in full-duplex mode. When a device opens the switch to transmit packets to the switch, the switch may determine if there are packets for the device in the queue and, if so, send packets to the device concurrently with receiving packets from the device, thus utilizing the FC-AL in full-duplex mode.
摘要:
A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete. The canceled instruction is then “replayed” during a later clock cycle from the reissue buffer. A method for performing denormal loads and a computer system are also disclosed.
摘要:
A multiplier configured to execute division and square root operations by executing iterative multiplication operations is disclosed. The multiplier is configured to complete divide-by-two and zero dividend instructions in fewer clock cycles by detecting them before or during the first iteration and then performing an exponent adjustment and rounding the result to the desired precision. A system and method for rapidly executing divide-by-two and zero dividend instructions within the context of a multiplier that executes division and square root instructions using iterative multiplication are also disclosed.
摘要:
A system and method for computing anisotropic texture mapping parameters by using approximation techniques reduces the complexity of the calculations needed to perform high quality anisotropic texture filtering. Anisotropic texture mapping parameters that are approximated may be computed using dedicated processing units within a graphics processor, thereby improving anisotropic texture mapping performance. Specifically, the major axis and minor axis of anisotropy are determined and their respective lengths are calculated using approximations. Other anisotropic texture mapping parameters, such as a level of detail for selecting a particular level are computed based on the calculated lengths of the major and minor axes.
摘要:
A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder.
摘要:
A microprocessor with a floating point unit configured to rapidly execute floating point compare (FCOMI) type instructions that are followed by floating point conditional move (FCMOV) type instructions is disclosed. FCOMI-type instructions, which normally store their results to integer status flag registers, are modified to store a copy of their results to a temporary register located within the floating point unit. If an FCMOV-type instruction is detected following an FCOMI-type instruction, then the FCMOV-type instruction's source for flag information is changed from the integer flag register to the temporary register. FCMOV-type instructions are thereby able to execute earlier because they need not wait for the integer flags to be read from the integer portion of the microprocessor. A computer system and method for rapidly executing FCOMI-type instructions followed by FCMOV-type instructions are also disclosed.