Graphics processor with memory management unit and cache coherent link
    41.
    发明授权
    Graphics processor with memory management unit and cache coherent link 有权
    具有内存管理单元和缓存一致链接的图形处理器

    公开(公告)号:US08860741B1

    公开(公告)日:2014-10-14

    申请号:US11608436

    申请日:2006-12-08

    IPC分类号: G09G5/36

    摘要: In contrast to a conventional computing system in which the graphics processor (graphics processing unit or GPU) is treated as a slave to one or several CPUs, systems and methods are provided that allow the GPU to be treated as a central processing unit (CPU) from the perspective of the operating system. The GPU can access a memory space shared by other CPUs in the computing system. Caches utilized by the GPU may be coherent with caches utilized by other CPUs in the computing system. The GPU may share execution of general-purpose computations with other CPUs in the computing system.

    摘要翻译: 与将图形处理器(图形处理单元或GPU)视为一个或多个CPU的从属设备的常规计算系统相反,提供允许GPU被视为中央处理单元(CPU)的系统和方法, 从操作系统的角度。 GPU可以访问计算系统中其他CPU共享的内存空间。 GPU使用的高速缓存可能与计算系统中其他CPU所使用的高速缓存一致。 GPU可能与计算系统中的其他CPU共享通用计算的执行。

    High jitter scheduling of interleaved frames in an arbitrated loop
    42.
    发明授权
    High jitter scheduling of interleaved frames in an arbitrated loop 失效
    仲裁循环中交错帧的高抖动调度

    公开(公告)号:US07809852B2

    公开(公告)日:2010-10-05

    申请号:US10152763

    申请日:2002-05-22

    IPC分类号: G06F13/00

    摘要: A system and method for converting low-jitter, interleaved frame traffic, such as that generated in an IP network, to high jitter traffic to improve the utilization of bandwidth on arbitrated loops such as Fibre Channel Arbitrated Loops. Embodiments of a high jitter scheduling algorithm may be used in devices such as network switches that interface an arbitrated loop with an IP network that carries low-jitter traffic. The high jitter algorithm may use a separate queue for each device on the arbitrated loop, or alternatively may use one queue for two or more devices. Incoming frames are distributed among the queues based upon each frame's destination device. The scheduling algorithm may then service the queues and forward queued frames to the devices from the queues. In one embodiment, the queues are serviced in a round-robin fashion. In one embodiment, each queue may be serviced for a programmed limit.

    摘要翻译: 将诸如IP网络中生成的低抖动交织帧流量转换为高抖动流量以提高诸如光纤通道仲裁环路之类的仲裁环路上的带宽利用率的系统和方法。 高抖动调度算法的实施例可以用于诸如将仲裁环路与承载低抖动流量的IP网络相连接的网络交换机的设备。 高抖动算法可以对仲裁环路上的每个设备使用单独的队列,或者可以为两个或更多个设备使用一个队列。 基于每个帧的目的地设备,进入的帧被分配在队列之间。 然后,调度算法可以对队列进行服务并将排队的帧从队列转发到设备。 在一个实施例中,队列以循环方式服务。 在一个实施例中,每个队列可以被服务于编程限制。

    Method and apparatus to ensure consistency of depth values computed in different sections of a graphics processor
    43.
    发明授权
    Method and apparatus to ensure consistency of depth values computed in different sections of a graphics processor 失效
    确保在图形处理器的不同部分计算的深度值的一致性的方法和装置

    公开(公告)号:US07659893B1

    公开(公告)日:2010-02-09

    申请号:US11538002

    申请日:2006-10-02

    IPC分类号: G06T15/40

    摘要: At least two different processing sections in a graphics processors compute Z coordinates for a sample location from a compressed Z representation. The processors are designed to ensure that Z coordinates computed in any unit in the processor are identical. In one embodiment, the respective arithmetic circuits included in each processing section that computes Z coordinates are “bit-identical,” meaning that, for any input planar Z representation and coordinates, the output Z coordinates produced by the circuits are identical to each other.

    摘要翻译: 图形处理器中的至少两个不同的处理部分从压缩的Z表示中计算样本位置的Z坐标。 处理器被设计为确保在处理器中的任何单元中计算的Z坐标是相同的。 在一个实施例中,计算Z坐标的每个处理部分中包括的相应运算电路是“位相同”,这意味着对于任何输入的平面Z表示和坐标,由电路产生的输出Z坐标彼此相同。

    Packet input thresholding for resource distribution in a network switch
    44.
    发明授权
    Packet input thresholding for resource distribution in a network switch 失效
    网络交换机资源分配的分组输入阈值

    公开(公告)号:US07227841B2

    公开(公告)日:2007-06-05

    申请号:US10144081

    申请日:2002-05-13

    IPC分类号: G01R31/08

    摘要: A system and method for input thresholding packets in a network switch. A network switch may include multiple input ports, multiple output ports, and a shared random access memory coupled to the input ports and output ports by data transport logic. Packets entering the network switch may be assigned to one of a plurality of threshold groups and to one of a plurality of flows within the threshold group. In one embodiment, each threshold group may be divided into a plurality of levels of operation. As resources are allocated or freed by the threshold group, the threshold group may dynamically move up or down in the levels of operation. Within each level, one or more different values may be used as level boundaries and resource limits for flows within the threshold group. In one embodiment, programmable registers may be used to store these values.

    摘要翻译: 一种用于在网络交换机中输入阈值分组的系统和方法。 网络交换机可以包括多个输入端口,多个输出端口以及通过数据传输逻辑耦合到输入端口和输出端口的共享随机存取存储器。 可以将进入网络交换机的分组分配给多个阈值组中的一个和阈值组内的多个流中的一个。 在一个实施例中,每个阈值组可被划分成多个操作级别。 当资源由阈值组分配或释放时,阈值组可以在操作级别上动态上移或下移。 在每个级别内,可以使用一个或多个不同的值作为阈值组内的流的级别边界和资源限制。 在一个实施例中,可以使用可编程寄存器来存储这些值。

    Method and apparatus for scheduling packet flow on a fibre channel arbitrated loop
    45.
    发明授权
    Method and apparatus for scheduling packet flow on a fibre channel arbitrated loop 失效
    用于在光纤信道仲裁环路上调度分组流的方法和装置

    公开(公告)号:US07215680B2

    公开(公告)日:2007-05-08

    申请号:US10144187

    申请日:2002-05-13

    IPC分类号: H04L12/56

    摘要: A system and method for enabling a network switch to transmit queued packets to a device when opened by the device, and thus to utilize the Fibre Channel Arbitrated Loop (FC-AL) in full-duplex mode when possible. The switch may include a plurality of queues each associated with a device on the FC-AL for queuing incoming packets for the device. The switch may determine a next non-empty queue, open the device associated with the queue, and send packets to the device. The device may send packets to the switch concurrently with receiving packets from the switch, thus utilizing the FC-AL in full-duplex mode. When a device opens the switch to transmit packets to the switch, the switch may determine if there are packets for the device in the queue and, if so, send packets to the device concurrently with receiving packets from the device, thus utilizing the FC-AL in full-duplex mode.

    摘要翻译: 一种用于使网络交换机能够在设备打开时将排队的分组传送到设备的系统和方法,并且因此在可能的情况下以全双工模式利用光纤通道仲裁环路(FC-AL)。 交换机可以包括多个队列,每个队列与FC-AL上的设备相关联,用于对用于设备的输入分组进行排队。 交换机可以确定下一个非空队列,打开与队列关联的设备,并将数据包发送到设备。 设备可以同时从交换机接收数据包发送数据包,从而在全双工模式下利用FC-AL。 当设备打开交换机向交换机发送数据包时,交换机可以确定队列中是否存在设备的数据包,如果是,则从设备接收数据包并发发送数据包,从而利用FC- AL在全双工模式。

    Method and apparatus for denormal load handling
    46.
    发明授权
    Method and apparatus for denormal load handling 有权
    用于异常负载处理的方法和装置

    公开(公告)号:US06487653B1

    公开(公告)日:2002-11-26

    申请号:US09383138

    申请日:1999-08-25

    IPC分类号: G06F738

    摘要: A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete. The canceled instruction is then “replayed” during a later clock cycle from the reissue buffer. A method for performing denormal loads and a computer system are also disclosed.

    摘要翻译: 公开了一种被配置为将其浮点负载流水线长度从一个阶段长度动态地切换到多于一个阶段的微处理器。 微处理器可以在单个时钟周期内执行正常负载并检测异常负载。 微处理器将至少一个时钟周期的每个调度的浮点指令临时存储在再发行缓冲器中。 当检测到非正常加载指令时,微处理器被配置为向浮点加载流水线添加一个或多个级,以允许异常值完成到内部格式的转换。 然后,较长的流水线将用于跟随异常负载的所有负载,直到发生空闲时钟周期或中止发生。 在这一点上,管道恢复到原来的较短状态。 此外,微处理器可以被配置为取消预定的指令,假设正常负载仅需要一个时钟周期来完成。 然后在从重新发行缓冲区的较后时钟周期内“取消”取消的指令。 还公开了一种用于执行异常负载的方法和计算机系统。

    Early completion of iterative division
    47.
    发明授权
    Early completion of iterative division 有权
    提前完成迭代划分

    公开(公告)号:US06487575B1

    公开(公告)日:2002-11-26

    申请号:US09385188

    申请日:1999-08-30

    申请人: Stuart F. Oberman

    发明人: Stuart F. Oberman

    IPC分类号: G06F738

    CPC分类号: G06F7/4873 G06F7/49926

    摘要: A multiplier configured to execute division and square root operations by executing iterative multiplication operations is disclosed. The multiplier is configured to complete divide-by-two and zero dividend instructions in fewer clock cycles by detecting them before or during the first iteration and then performing an exponent adjustment and rounding the result to the desired precision. A system and method for rapidly executing divide-by-two and zero dividend instructions within the context of a multiplier that executes division and square root instructions using iterative multiplication are also disclosed.

    摘要翻译: 被配置为通过执行迭代乘法运算执行除法和平方根操作的乘法器被公开。 乘法器被配置为通过在第一次迭代之前或期间检测它们来在更短的时钟周期内完成二分频和零除数指令,然后执行指数调整并将结果舍入到期望的精度。 还公开了一种用于在使用迭代乘法执行除法和平方根指令的乘法器的上下文中快速执行二分频和零除数指令的系统和方法。

    Computing anisotropic texture mapping parameters
    48.
    发明授权
    Computing anisotropic texture mapping parameters 有权
    计算各向异性纹理映射参数

    公开(公告)号:US07369136B1

    公开(公告)日:2008-05-06

    申请号:US11016485

    申请日:2004-12-17

    IPC分类号: G09G5/00

    CPC分类号: G06T15/04

    摘要: A system and method for computing anisotropic texture mapping parameters by using approximation techniques reduces the complexity of the calculations needed to perform high quality anisotropic texture filtering. Anisotropic texture mapping parameters that are approximated may be computed using dedicated processing units within a graphics processor, thereby improving anisotropic texture mapping performance. Specifically, the major axis and minor axis of anisotropy are determined and their respective lengths are calculated using approximations. Other anisotropic texture mapping parameters, such as a level of detail for selecting a particular level are computed based on the calculated lengths of the major and minor axes.

    摘要翻译: 通过使用近似技术来计算各向异性纹理映射参数的系统和方法降低了执行高质量各向异性纹理滤波所需的计算的复杂度。 可以使用图形处理器内的专用处理单元来计算近似的各向异性纹理映射参数,从而改善各向异性纹理映射性能。 具体地,确定各向异性的长轴和短轴,并使用近似计算其各自的长度。 基于所计算的主轴和短轴的长度来计算其他各向异性纹理映射参数,例如用于选择特定水平的细节级别。

    Floating point addition pipeline including extreme value, comparison and accumulate functions
    49.
    发明授权
    Floating point addition pipeline including extreme value, comparison and accumulate functions 有权
    浮点附加流水线包括极值,比较和累加功能

    公开(公告)号:US06397239B2

    公开(公告)日:2002-05-28

    申请号:US09778352

    申请日:2001-02-06

    IPC分类号: G06F742

    摘要: A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder.

    摘要翻译: 多媒体执行单元被配置为执行矢量的浮点和整数指令。 执行单元可以包括具有远近数据路径的加法/减法流水线。 远程路径被配置为处理具有大于1的绝对指数差的操作数的有效加法运算和有效减法运算。 关闭路径被配置为处理具有小于或等于1的绝对指数差的操作数的有效减法操作。 关闭路径被配置为生成两个输出值,其中一个输出值是第一输入操作数加上第二输入操作数的反转版本,而第二输出值等于第一输出值加1。 在闭合路径中选择第一或第二输出值对加法器的输出实现了舍入到最近的运算。

    Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit
    50.
    发明授权
    Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit 有权
    通过将比较结果存储在浮点单元中的临时寄存器中,FCOMI后快速执行FCMOV

    公开(公告)号:US06393555B1

    公开(公告)日:2002-05-21

    申请号:US09370787

    申请日:1999-08-05

    IPC分类号: G06F930

    摘要: A microprocessor with a floating point unit configured to rapidly execute floating point compare (FCOMI) type instructions that are followed by floating point conditional move (FCMOV) type instructions is disclosed. FCOMI-type instructions, which normally store their results to integer status flag registers, are modified to store a copy of their results to a temporary register located within the floating point unit. If an FCMOV-type instruction is detected following an FCOMI-type instruction, then the FCMOV-type instruction's source for flag information is changed from the integer flag register to the temporary register. FCMOV-type instructions are thereby able to execute earlier because they need not wait for the integer flags to be read from the integer portion of the microprocessor. A computer system and method for rapidly executing FCOMI-type instructions followed by FCMOV-type instructions are also disclosed.

    摘要翻译: 具有浮点单元的微处理器被配置为快速执行浮点比较(FCOMI)类型指令,其后面是浮点条件移动(FC​​MOV)类型指令。 通常将其结果存储到整数状态标志寄存器的FCOMI型指令进行修改,以将其结果的副本存储到位于浮点单元内的临时寄存器。 如果在FCOMI型指令之后检测到FCMOV型指令,则FCMOV型指令的标志信息源从整数标志寄存器改变为临时寄存器。 因此,FCMOV型指令能够早期执行,因为它们不需要等待从微处理器的整数部分读取整数标志。 还公开了一种用于快速执行FCOMI型指令的计算机系统和方法,随后是FCMOV型指令。