Converting an arbitrary fixed point value to a floating point value
    1.
    发明授权
    Converting an arbitrary fixed point value to a floating point value 有权
    将任意固定点值转换为浮点值

    公开(公告)号:US06671796B1

    公开(公告)日:2003-12-30

    申请号:US09513494

    申请日:2000-02-25

    IPC分类号: G06F9302

    摘要: A method and apparatus are provided for performing efficient conversion operations between floating point and fixed point values on a general purpose processor. This is achieved by providing an instruction for converting a fixed point value fx into a floating point value fl in a general purpose processor. Accordingly, the invention advantageously provides a general purpose processor with the ability to execute conversion operation between fixed-point and floating-point values with a single instruction compared with prior art general purpose processors that require multiple instructions to perform the same function. Thus, the general purpose processor of the present invention allows for more efficient and faster conversion operations between fixed-point and floating-point values.

    摘要翻译: 提供了一种方法和装置,用于在通用处理器上的浮点和固定点值之间执行有效的转换操作。 这通过在通用处理器中提供用于将固定点值fx转换为浮点值f1的指令来实现。 因此,本发明有利地提供了一种通用处理器,与通过需要多个指令执行相同功能的现有技术的通用处理器相比,能够以单个指令执行定点和浮点值之间的转换操作。 因此,本发明的通用处理器允许在定点和浮点值之间更有效和更快的转换操作。

    Sending both a load instruction and retrieved data from a load buffer to an annex prior to forwarding the load data to register file
    2.
    发明授权
    Sending both a load instruction and retrieved data from a load buffer to an annex prior to forwarding the load data to register file 有权
    在将加载数据转发到寄存器文件之前,将加载指令和检索到的数据从加载缓冲区发送到附件

    公开(公告)号:US06542988B1

    公开(公告)日:2003-04-01

    申请号:US09410842

    申请日:1999-10-01

    IPC分类号: G06F938

    摘要: A processor performs precise trap handling for out-of-order and speculative load instructions. It keeps track of the age of load instructions in a shared scheme that includes a load buffer and a load annex. All precise exceptions are detected in a T phase of a load pipeline. Data and control information concerning load operations that hit in the data cache are staged in a load annex during the A1, A2, A3, and T pipeline stages until all exceptions in the same or earlier instruction packet are detected. Data and control information from all other load instructions is staged in the load annex after the load data is retrieved. Before the load data is retrieved, the load instruction is kept in a load buffer. If an exception occurs, any load in the same instruction packet as the instruction causing the exception is canceled. Any load instructions that are “younger” than the instruction that caused the exception are also canceled. The age of load instructions is determined by tracking the pipe stages of the instruction. When a trap occurs, any load instruction with a non-zero age indicator is canceled.

    摘要翻译: 处理器执行精确的陷阱处理,用于无序和推测的加载指令。 它跟踪包含加载缓冲区和加载附件的共享方案中的加载指令的时间。 在负载管线的T相中检测到所有精确异常。 在A1,A2,A3和T流水线阶段期间,有关在数据高速缓存中打入的加载操作的数据和控制信息在负载附件中分段,直到检测到相同或较早的指令包中的所有异常。 在检索负载数据后,所有其他装载指令的数据和控制信息都将在装载附件中进行。 在检索加载数据之前,将加载指令保存在加载缓冲区中。 如果发生异常,与导致异常的指令相同的指令包中的任何负载都将被取消。 任何比引起异常的指令“年轻”的加载指令也被取消。 加载指令的年龄是通过跟踪指令的管道段来确定的。 发生陷阱时,将取消带有非零年龄指示符的任何加载指令。

    Decompression bit processing with a general purpose alignment tool
    3.
    发明授权
    Decompression bit processing with a general purpose alignment tool 有权
    使用通用对齐工具进行减压位处理

    公开(公告)号:US06757820B2

    公开(公告)日:2004-06-29

    申请号:US10356437

    申请日:2003-01-31

    IPC分类号: G06F9308

    摘要: A method and apparatus for performing single-instruction bit field extraction and for counting a number of leading zeros in a sequence of bits on a general purpose processor are provided. The fast bit extraction operations are accomplished by executing a first instruction for extracting an arbitrary number of bits of a sequence of bits stored in two or more source registers of the processor starting at an arbitrary offset and the storing the extracted bits in a destination register. Both the source and the destination registers are specified by the instruction. In addition, a second instruction is provided for counting the number of leading zeros in a sequence of bits stored in two or more source registers of the processor and then storing a binary value representing the number of leading zeros in a destination register. Again the source and the destination registers are specified by the second instruction. Both the first and the second instructions are pipelined to obtain an effective throughput of one instruction every cycle, respectively. As a result, bit extraction operations are performed very efficiently by the processor, thereby reducing the overall processing time required to compress and decompress multimedia data.

    摘要翻译: 提供了一种用于执行单指令位域提取并用于对通用处理器上的位序列中的多个前导零进行计数的方法和装置。 快速位提取操作通过执行用于从任意偏移开始提取存储在处理器的两个或更多个源寄存器中的位序列的任意数量的比特并且将提取的比特存储在目的地寄存器中来实现。 源寄存器和目标寄存器均由指令指定。 此外,提供第二指令,用于对存储在处理器的两个或多个源寄存器中的位序列中的前导零的数目进行计数,然后将表示前导零数的二进制值存储在目的地寄存器中。 源和目标寄存器又由第二条指令指定。 第一和第二指令都被流水线分别获得每个周期一个指令的有效吞吐量。 结果,处理器非常有效地执行比特提取操作,从而减少压缩和解压多媒体数据所需的整体处理时间。

    Efficient clip-testing in graphics acceleration
    4.
    发明授权
    Efficient clip-testing in graphics acceleration 有权
    图形加速中的高效剪贴测试

    公开(公告)号:US07042466B1

    公开(公告)日:2006-05-09

    申请号:US09589039

    申请日:2000-06-06

    IPC分类号: G09G5/30

    摘要: A method and apparatus for performing fast clip-testing operations in a general purpose processor are provided. This is accomplished by executing a single instruction for comparing a first value x to a second value y and, as a result of the comparison, determining whether x is less than y and whether x is less than negative y. The values x and y are stored in respective source registers of the processor specified by the instruction. Finally, as a result of the determination, one or more binary values representing the results of the determination are inserted into a destination register of the processor also specified by the instruction. Accordingly, the invention advantageously provides a general purpose processor with the ability to execute a clip-testing function with a single instruction compared with prior art general purpose processors that require multiple instructions to perform the same function. Thus, the general purpose processor of the present invention allows for more efficient and faster clip-testing operations.

    摘要翻译: 提供了一种用于在通用处理器中执行快速剪辑测试操作的方法和装置。 这是通过执行用于将第一值x与第二值y进行比较的单个指令来实现的,并且作为比较的结果,确定x是否小于y且x是否小于负y。 值x和y存储在指令指定的处理器的各个源寄存器中。 最后,作为确定的结果,将表示确定结果的一个或多个二进制值插入到由指令指定的处理器的目标寄存器中。 因此,与现有技术的通用处理器相比,本发明有利地提供了具有使用单个指令执行剪辑测试功能的能力的通用处理器,其需要多个指令来执行相同的功能。 因此,本发明的通用处理器允许更有效和更快速的剪辑测试操作。

    Addressable output buffer architecture
    5.
    发明授权
    Addressable output buffer architecture 有权
    可寻址输出缓冲架构

    公开(公告)号:US06407740B1

    公开(公告)日:2002-06-18

    申请号:US09164074

    申请日:1998-09-30

    IPC分类号: G06F1516

    CPC分类号: G06T15/005

    摘要: Incoming geometry data are buffered in one or more buffers. The data are written to the buffers in an order which is not necessarily the order in which a processor or processors that construct images from the data need the data for fast processing. The data are provided to the processors in the order needed for fast processing. In some embodiments, fast processing involves starting critical path computations early. Examples of critical path computations are lighting computations which take more time than position computations. At least one processor has a pipelined instruction execution unit. The processor executes critical path computation instructions as long as a critical path instruction can be started without causing a pipeline stall. When no critical path instructions can be started without causing a stall, the processor starts a non-critical path instruction.

    摘要翻译: 传入的几何数据被缓冲在一个或多个缓冲区中。 将数据按顺序写入缓冲器,这不一定是从数据构建图像的处理器或处理器需要数据进行快速处理的顺序。 数据以快速处理所需的顺序提供给处理器。 在一些实施例中,快速处理涉及提前开始关键路径计算。 关键路径计算的例子是比位置计算花费更多时间的照明计算。 至少一个处理器具有流水线指令执行单元。 处理器执行关键路径计算指令,只要可以启动关键路径指令,而不会导致管道停顿。 当没有关键路径指令可以在不引起停顿的情况下启动时,处理器启动非关键路径指令。

    Partially executing a pending atomic instruction to unlock resources when cancellation of the instruction occurs
    6.
    发明授权
    Partially executing a pending atomic instruction to unlock resources when cancellation of the instruction occurs 有权
    当发生指令取消时,部分执行挂起的原子指令来解锁资源

    公开(公告)号:US06282637B1

    公开(公告)日:2001-08-28

    申请号:US09204760

    申请日:1998-12-02

    IPC分类号: G06F1300

    摘要: When an atomic instruction executed by a computer processor locks a memory location, the locking is performed before the processor has determined whether the instruction is to be executed to completion or canceled. The memory location is unlocked whether or not the instruction will be canceled. Since the locking operation can occur before it is known whether the instruction will be canceled, the reading of the memory location can also occur early, before it is known whether the instruction will be canceled.

    摘要翻译: 当由计算机处理器执行的原子指令锁定存储器位置时,在处理器确定是要执行指令以完成或取消之前执行锁定。 无论指令是否被取消,存储位置都将被解锁。 由于在知道指令是否被取消之前可以发生锁定操作,所以在知道指令是否被取消之前也可以及早发生存储器位置的读取。

    Writing of instruction results produced by instruction execution
circuits to result destinations
    8.
    发明授权
    Writing of instruction results produced by instruction execution circuits to result destinations 有权
    将指令执行电路产生的指令结果写入结果目的地

    公开(公告)号:US6163837A

    公开(公告)日:2000-12-19

    申请号:US193487

    申请日:1998-11-17

    IPC分类号: G06F9/30 G06F9/38 G06F15/00

    摘要: Two instruction executions circuits C1 and C2, possibly pipelined, share a write port to write instruction results to their destinations. When both circuits have results available for writing in the same clock cycle, the write port is given to circuit C1. Circuit C2 gets the write port only when there is a bubble in the write back stage of circuit C1. Circuit C2 executes instructions that occur infrequently in an average program. Examples are division, reciprocal square root, and power computation instructions. Circuit C1 executes instructions that occur more frequently. Circuits C1 and C2 are part of a functional unit of a VLIW processor.

    摘要翻译: 两个指令执行电路C1和C2(可能是流水线的)共享写入端口,以将指令结果写入其目的地。 当两个电路在相同的时钟周期内都有可用于写入的结果时,写入端口被提供给电路C1。 电路C2仅在电路C1的写回级中有气泡时才得到写入端口。 电路C2执行平均程序中不经常发生的指令。 示例是除法,倒数平方根和功率计算指令。 电路C1执行更频繁发生的指令。 电路C1和C2是VLIW处理器的功能单元的一部分。