Rapid execution of floating point load control word instructions
    1.
    发明授权
    Rapid execution of floating point load control word instructions 有权
    快速执行浮点负载控制字指令

    公开(公告)号:US06405305B1

    公开(公告)日:2002-06-11

    申请号:US09394024

    申请日:1999-09-10

    IPC分类号: G06F9302

    摘要: A microprocessor with a floating point unit configured to rapidly execute floating point load control word (FLDCW) type instructions in an out of program order context is disclosed. The floating point unit is configured to schedule instructions older than the FLDCW-type instruction before the FLDCW-type instruction is scheduled. The FLDCW-type instruction acts as a barrier to prevent instructions occurring after the FLDCW-type instruction in program order from executing before the FLDCW-type instruction. Indicator bits may be used to simplify instruction scheduling, and copies of the floating point control word may be stored for instruction that have long execution cycles. A method and computer configured to rapidly execute FLDCW-type instructions in an out of program order context are also disclosed.

    摘要翻译: 具有浮点单元的微处理器被配置为在程序顺序上下文中快速执行浮点负载控制字(FLDCW)类型指令。 浮点单元被配置为在调度FLDCW类型指令之前调度比FLDCW类型指令更早的指令。 FLDCW型指令作为屏障,以防止在FLDCW类型指令之前执行FLDCW类型指令之后的程序顺序发生的指令。 指示符位可以用于简化指令调度,并且可以存储具有长执行周期的指令的浮点控制字的副本。 还公开了一种配置成在程序顺序上下文中快速执行FLDCW型指令的方法和计算机。

    Apparatus and method for superforwarding load operands in a microprocessor
    2.
    发明授权
    Apparatus and method for superforwarding load operands in a microprocessor 有权
    用于在微处理器中超载负载操作数的装置和方法

    公开(公告)号:US06442677B1

    公开(公告)日:2002-08-27

    申请号:US09329497

    申请日:1999-06-10

    IPC分类号: G06F9312

    CPC分类号: G06F9/30043 G06F9/3826

    摘要: An apparatus and method for superforwarding load operands in a microprocessor are provided. An execution unit in a microprocessor is configured to receive a load instruction and a subsequent instruction. If the load instruction corresponds to a simple load instruction, a destination operand of the load instruction can be superforwarded to a subsequent instruction if the subsequent instruction specifies a source operand that depends on the destination operand of the load instruction. The subsequent instruction is not required to wait until a load instruction executes or completes and can be scheduled and/or executed prior to or at the same time as the load instruction. Consequently, latencies associated with operand dependencies may be reduced.

    摘要翻译: 提供了一种用于在微处理器中超载负载操作数的装置和方法。 微处理器中的执行单元被配置为接收加载指令和后续指令。 如果加载指令对应于简单的加载指令,则如果后续指令指定依赖于加载指令的目的地操作数的源操作数,则加载指令的目标操作数可以被超前给后续指令。 后续指令不需要等待加载指令执行或完成,并且可以在加载指令之前或同时进行调度和/或执行。 因此,可以减少与操作数相关性相关联的延迟。

    Method and apparatus for denormal load handling
    3.
    发明授权
    Method and apparatus for denormal load handling 有权
    用于异常负载处理的方法和装置

    公开(公告)号:US06487653B1

    公开(公告)日:2002-11-26

    申请号:US09383138

    申请日:1999-08-25

    IPC分类号: G06F738

    摘要: A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete. The canceled instruction is then “replayed” during a later clock cycle from the reissue buffer. A method for performing denormal loads and a computer system are also disclosed.

    摘要翻译: 公开了一种被配置为将其浮点负载流水线长度从一个阶段长度动态地切换到多于一个阶段的微处理器。 微处理器可以在单个时钟周期内执行正常负载并检测异常负载。 微处理器将至少一个时钟周期的每个调度的浮点指令临时存储在再发行缓冲器中。 当检测到非正常加载指令时,微处理器被配置为向浮点加载流水线添加一个或多个级,以允许异常值完成到内部格式的转换。 然后,较长的流水线将用于跟随异常负载的所有负载,直到发生空闲时钟周期或中止发生。 在这一点上,管道恢复到原来的较短状态。 此外,微处理器可以被配置为取消预定的指令,假设正常负载仅需要一个时钟周期来完成。 然后在从重新发行缓冲区的较后时钟周期内“取消”取消的指令。 还公开了一种用于执行异常负载的方法和计算机系统。

    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria
    4.
    发明授权
    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria 有权
    根据标准优化多管道可执行和特定管道可执行指令的分配到执行管道

    公开(公告)号:US06370637B1

    公开(公告)日:2002-04-09

    申请号:US09370789

    申请日:1999-08-05

    IPC分类号: G06F938

    摘要: A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.

    摘要翻译: 公开了一种具有配置成有效地分配多流水线可执行指令的浮点单元的微处理器。 多管道可执行指令是不强制在特定类型执行管道中执行的指令。 例如,垃圾操作是多管道可执行的。 垃圾操作是在浮点单元的流水线的早期执行的指令(例如,在寄存器重命名期间),但是仍然通过用于异常检查的执行管线。 垃圾操作不限于特定的执行管道,而是可以通过浮点单元中的任何一个微处理器的执行流水线。 使用许多不同的标准,在每个时钟周期的基础上分配多流水线可执行指令。 例如,分配可以根据浮点单元在单个时钟周期中接收的多流水线可执行指令的数量而变化。

    Method and apparatus for rapid execution of FCOM and FSTSW
    5.
    发明授权
    Method and apparatus for rapid execution of FCOM and FSTSW 有权
    用于快速执行FCOM和FSTSW的方法和装置

    公开(公告)号:US06425074B1

    公开(公告)日:2002-07-23

    申请号:US09393524

    申请日:1999-09-10

    IPC分类号: G06F9302

    摘要: A microprocessor configured to rapidly execute floating point store status word (FSTSW) type instructions that are immediately preceded by floating point compare (FCOM) type instructions is disclosed. FCOM-type instructions are modified to store their results to an architectural floating point status word and a temporary destination register. If an FSTSW-type instruction is detected immediately following an FCOM-type instruction, then the FSTSW-type instruction is transformed into a special fast floating point store status word (FSTSWEF) instruction. Unlike the FSTSW-type instruction, which is serializing and negatively impacts performance, the FSTSWEF instruction is not serializing and allows execution to continue without undue serialization. A computer system and method for rapidly executing FSTSW instructions immediately preceded by FCOM-type instructions are also disclosed.

    摘要翻译: 公开了一种被配置为快速执行浮点比较(FCOM)类型指令之前的浮点存储状态字(FSTSW)类型指令的微处理器。 修改FCOM类型的指令以将其结果存储到架构浮点状态字和临时目标寄存器。 如果在FCOM型指令之后立即检测到FSTSW型指令,则FSTSW型指令被转换为特殊的快速浮点存储状态字(FSTSWEF)指令。 与串行化和负面影响性能的FSTSW型指令不同,FSTSWEF指令不是序列化的,允许执行继续,而不会过多的序列化。 还公开了一种用于在紧接在FCOM型指令之前快速执行FSTSW指令的计算机系统和方法。

    Converting register data from a first format type to a second format
type if a second type instruction consumes data produced by a first
type instruction
    6.
    发明授权
    Converting register data from a first format type to a second format type if a second type instruction consumes data produced by a first type instruction 失效
    如果第二类型指令消耗由第一类型指令产生的数据,则将寄存器数据从第一格式类型转换为第二格式类型

    公开(公告)号:US6105129A

    公开(公告)日:2000-08-15

    申请号:US25233

    申请日:1998-02-18

    摘要: A microprocessor includes one or more registers which are architecturally defined to be used for at least two data formats. In one embodiment, the registers are the floating point registers defined in the x86 architecture, and the data formats are the floating point data format and the multimedia data format. The registers actually implemented by the microprocessor for the floating point registers use an internal format for floating point data. Part of the internal format is a classification field which classifies the floating point data in the extended precision defined by the x86 microprocessor architecture. Additionally, a classification field encoding is reserved for multimedia data. As the microprocessor begins execution of each multimedia instruction, the classification information of the source operands is examined to determine if the data is either in the multimedia class, or in a floating point class in which the significand portion of the register is the same as the corresponding significand in extended precision. If so, the multimedia instruction executes normally. If not, the multimedia instruction is faulted. Similarly, as the microprocessor begins execution of each floating point instruction, the classification information of the source operands is examined. If the data is classified as multimedia, the floating point instruction is faulted. A microcode routine is used to reformat the data stored in at least the source registers of the faulting instruction into a format useable by the faulting instruction. Subsequently, the faulting instruction is re-executed.

    摘要翻译: 微处理器包括一个或多个寄存器,其被架构地定义为用于至少两种数据格式。 在一个实施例中,寄存器是在x86架构中定义的浮点寄存器,数据格式是浮点数据格式和多媒体数据格式。 微处理器为浮点寄存器实际实现的寄存器使用浮点数据的内部格式。 内部格式的一部分是分类字段,它以由x86微处理器架构定义的扩展精度对浮点数据进行分类。 此外,分类字段编码被保留用于多媒体数据。 当微处理器开始执行每个多媒体指令时,检查源操作数的分类信息以确定数据是在多媒体类中还是在浮点类中,其中寄存器的有效部分与 相应的显着性在扩展精度。 如果是这样,多媒体指令正常执行。 如果不是,则多媒体指令发生故障。 类似地,当微处理器开始执行每个浮点指令时,检查源操作数的分类信息。 如果数据被分类为多媒体,则浮点指令发生故障。 微码程序用于将存储在故障指令的至少源寄存器中的数据重新格式化为故障指令可使用的格式。 随后,重新执行故障指令。

    Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit
    7.
    发明授权
    Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit 有权
    通过将比较结果存储在浮点单元中的临时寄存器中,FCOMI后快速执行FCMOV

    公开(公告)号:US06393555B1

    公开(公告)日:2002-05-21

    申请号:US09370787

    申请日:1999-08-05

    IPC分类号: G06F930

    摘要: A microprocessor with a floating point unit configured to rapidly execute floating point compare (FCOMI) type instructions that are followed by floating point conditional move (FCMOV) type instructions is disclosed. FCOMI-type instructions, which normally store their results to integer status flag registers, are modified to store a copy of their results to a temporary register located within the floating point unit. If an FCMOV-type instruction is detected following an FCOMI-type instruction, then the FCMOV-type instruction's source for flag information is changed from the integer flag register to the temporary register. FCMOV-type instructions are thereby able to execute earlier because they need not wait for the integer flags to be read from the integer portion of the microprocessor. A computer system and method for rapidly executing FCOMI-type instructions followed by FCMOV-type instructions are also disclosed.

    摘要翻译: 具有浮点单元的微处理器被配置为快速执行浮点比较(FCOMI)类型指令,其后面是浮点条件移动(FC​​MOV)类型指令。 通常将其结果存储到整数状态标志寄存器的FCOMI型指令进行修改,以将其结果的副本存储到位于浮点单元内的临时寄存器。 如果在FCOMI型指令之后检测到FCMOV型指令,则FCMOV型指令的标志信息源从整数标志寄存器改变为临时寄存器。 因此,FCMOV型指令能够早期执行,因为它们不需要等待从微处理器的整数部分读取整数标志。 还公开了一种用于快速执行FCOMI型指令的计算机系统和方法,随后是FCMOV型指令。

    Store queue multimatch detection
    8.
    发明授权
    Store queue multimatch detection 有权
    存储队列多重检测

    公开(公告)号:US06523109B1

    公开(公告)日:2003-02-18

    申请号:US09433189

    申请日:1999-10-25

    申请人: Stephan G. Meier

    发明人: Stephan G. Meier

    IPC分类号: G06F944

    摘要: A processor includes a store queue configured to detect a hit on a store queue entry for a load being executed by the processor, and to forward data from the store queue entry to provide a result for the load. The store queue data is provided to the data cache, along with an indication of how much data is being provided (e.g. byte enables). The data cache may then fill in any additional data accessed by the load from cache data, and provide a load result. Additionally, the store queue is configured to detect if more than one store queue entry is hit (i.e. that more than one store within the store queue updates at least one byte accessed by the load), referred to as a multimatch. If a multimatch is detected, the store queue retries the load. Subsequently, the load may be reexecuted and may not multimatch (as entries are deleted upon completion of the corresponding stores). The load may complete when it does not multimatch. In one embodiment, the store queue independently detects hits on the upper and lower portions of each store queue entry (e.g. doubleword portions) and forwards from the upper and lower portions independently. Thus, a load may hit one store queue entry for the lower portion of the data accessed by the load and a different store queue entry for the upper portion of the data accessed by the load without multimatch detection.

    摘要翻译: 处理器包括存储队列,其被配置为检测由处理器执行的负载的存储队列条目的命中,以及从存储队列条目转发数据以提供负载的结果。 存储队列数据被提供给数据高速缓存,以及提供多少数据的指示(例如,字节使能)。 然后,数据高速缓存可以填充来自高速缓存数据的负载访问的任何附加数据,并提供负载结果。 此外,存储队列被配置为检测是否命中多于一个存储队列条目(即,存储队列内的多于一个存储器更新由负载访问的至少一个字节),被称为多映象。 如果检测到多重检测,则存储队列将重试加载。 随后,可以重新执行加载,并且可能不会进行多重映射(当对应的存储完成时,条目被删除)。 负载可能在不进行多重测量时完成。 在一个实施例中,存储队列独立地检测每个存储队列条目的上部和下部的命中(例如双字部分),并独立地从上部和下部前进。 因此,负载可以针对由负载访问的数据的较低部分命中一个存储队列条目,以及针对由负载访问的数据的上部的不同的存储队列条目,而不进行多重检测。

    Store queue number assignment and tracking
    9.
    发明授权
    Store queue number assignment and tracking 有权
    存储队列号分配和跟踪

    公开(公告)号:US06481251B1

    公开(公告)日:2002-11-19

    申请号:US09433184

    申请日:1999-10-25

    IPC分类号: G06F300

    摘要: A processor includes a store queue and a store queue number assignment circuit. The store queue number assignment circuit assigns store queue numbers to stores, and operates upon instruction operations prior to the instruction operations reaching a point in the pipeline of the processor at which out of order instruction processing begins. Thus, store queue entries may be reserved for stores according to the program order of the stores. Additionally, in one embodiment, the store queue number identifying the youngest store represented in the store queue may be assigned to loads. In this manner, loads may determine which stores in the store queue are older or younger than the load based on relative position within the store queue. Checking for store queue hits may be qualified with the entries between the head of the store queue and the entry indicated by the load's store queue number. In one particular embodiment, the store queue number may include an additional “toggle” bit which is toggled each time the assignment of store queue numbers reaches the maximum store queue entry and wraps to zero. If the toggle bit of the store in the store queue entry identified by the load's store queue number differs from the toggle bit of the load's store queue number, than the store queue entry has been reassigned to a store younger than the load.

    摘要翻译: 处理器包括存储队列和存储队列号分配电路。 存储队列号分配电路分配存储队列号以存储,并且在指令操作到达处理器的流水线之点的指令操作之前进行操作,在该处理器的流水线处,开始无序指令处理。 因此,存储队列条目可以根据商店的程序顺序保留用于商店。 另外,在一个实施例中,识别存储队列中表示的最小存储的存储队列号可被分配给负载。 以这种方式,负载可以基于存储队列内的相对位置来确定存储队列中的哪些存储器比负载更老或更小。 可以使用存储队列的头部和负载的存储队列号指示的条目之间的条目来限定检查存储队列命中。 在一个特定实施例中,存储队列号可以包括在每次存储队列号的分配达到最大存储队列条目并且转换为零时切换的附加“切换”位。 如果由加载存储队列号识别的存储队列条目中的存储的切换位与加载存储队列号的切换位不同,则存储队列条目已经重新分配给小于加载的存储。

    Dynamic memory allocation suitable for stride-based prefetching
    10.
    发明授权
    Dynamic memory allocation suitable for stride-based prefetching 失效
    动态内存分配适合基于步幅的预取

    公开(公告)号:US6076151A

    公开(公告)日:2000-06-13

    申请号:US948947

    申请日:1997-10-10

    申请人: Stephan G. Meier

    发明人: Stephan G. Meier

    IPC分类号: G06F9/38 G06F12/08 G06F17/30

    摘要: A dynamic memory allocation routine maintains an allocation size cache which records the address of a most recently allocated memory block for each different size of memory block that has been allocated. Upon receiving a dynamic memory allocation request, the dynamic memory allocation routine determines if the requested size is equal to one of the sizes recorded in the allocation size cache. If a matching size is found, the dynamic memory allocation routine attempts to allocate a memory block contiguous to the most recently allocated memory block of that matching size. If the contiguous memory block has been allocated to another memory block, the dynamic memory allocation routine attempts to reserve a reserved memory block having a size which is a predetermined multiple of the requested size. The requested memory block is then allocated at the beginning of the reserved memory block. By reserving the reserved memory block, the dynamic memory allocation routine may increase the likelihood that subsequent requests for memory blocks having the requested size can be allocated in contiguous memory locations.

    摘要翻译: 动态存储器分配程序维护分配大小高速缓存,其记录已分配的每个不同大小的存储器块的最近分配的存储块的地址。 在接收到动态存储器分配请求时,动态存储器分配例程确定所请求的大小是否等于记录在分配大小高速缓存中的尺寸之一。 如果找到匹配的大小,则动态内存分配例程尝试分配与该匹配大小最近分配的内存块相邻的内存块。 如果连续存储器块已被分配给另一个存储器块,则动态存储器分配例程尝试预留具有所请求大小的预定倍数的大小的保留存储器块。 然后,请求的存储器块在保留的存储器块的开头被分配。 通过保留保留的存储器块,动态存储器分配程序可以增加对具有所请求大小的存储器块的后续请求可以在连续存储器位置中分配的可能性。