Analyzing instruction completion delays in a processor
    42.
    发明授权
    Analyzing instruction completion delays in a processor 失效
    分析处理器中的指令完成延迟

    公开(公告)号:US07047398B2

    公开(公告)日:2006-05-16

    申请号:US10210358

    申请日:2002-07-31

    IPC分类号: G06F11/34

    摘要: A method and system for identifying instruction completion delays for a group of instructions in a computer processor. Each instruction in the group of instructions has a status indicator that identifies what is preventing that instruction from completing execution. Examples of completion delays are cache misses, data dependencies or simply the time required for an execution unit in the computer processor to process the instruction. As each instruction finishes executing, its associated status indicator is cleared to indicate that the instruction is no longer waiting to execute. The last instruction to execute is the instruction that is holding up completion of the entire group, and thus the cause for the completion delay of the last instruction is recorded as the cause of completion delay for the entire group.

    摘要翻译: 一种用于识别计算机处理器中的一组指令的指令完成延迟的方法和系统。 指令组中的每个指令都有一个状态指示器,用于标识阻止该指令完成执行的内容。 完成延迟的示例是缓存未命中,数据依赖性或简单地计算机处理器中的执行单元处理指令所需的时间。 每个指令执行完毕后,相关状态指示灯将被清除,表示该指令不再等待执行。 执行的最后一条指令是保持整个组的完成的指令,因此将最后指令的完成延迟的原因记录为整个组的完成延迟的原因。

    Completion monitoring in a processor having multiple execution units with various latencies
    44.
    发明授权
    Completion monitoring in a processor having multiple execution units with various latencies 失效
    具有多个具有不同延迟的执行单元的处理器中的完成监视

    公开(公告)号:US06826678B2

    公开(公告)日:2004-11-30

    申请号:US10122034

    申请日:2002-04-11

    IPC分类号: G06F938

    摘要: A method, processor architecture, computer program product, and data processing system for determining when an instruction in a pipelined processor should be completed is provided. As each instruction is issued to an execution unit, an entry for that instruction is placed within a “finish pipe,” which consists of a series of consecutively numbered stages. Each clock cycle, the entries in the finish pipe advance one stage. When an entry has reached the stage corresponding to the latency of its associated execution unit, it becomes mature. Each clock cycle, the finish pipe is scanned to find the entry having the highest-numbered stage of any entry in the finish pipe. If that entry is mature, it is removed from the finish pipe and the instructions associated with that entry is allowed to complete. If not, the entry simply advances along with the other entries and the pipe rescanned in the next cycle.

    摘要翻译: 提供了一种用于确定何时应该完成流水线处理器中的指令的方法,处理器架构,计算机程序产品和数据处理系统。 当每个指令被发送到执行单元时,该指令的条目被放置在由“连续编号”序列组成的“完成管道”内。 每个时钟周期,完成管道中的条目提前一个阶段。 当条目达到与其关联的执行单元的延迟对应的阶段时,它变得成熟。每个时钟周期,扫描完成管道以找到完成管道中任何条目的最高编号阶段的条目。 如果该条目成熟,将从完成管道中删除,并允许与该条目关联的指令完成。 如果没有,则条目将随着其他条目和下一个循环中重新扫描的管道而前进。

    System for rejecting and reissuing instructions after a variable delay time period
    45.
    发明授权
    System for rejecting and reissuing instructions after a variable delay time period 失效
    在可变延迟时间段后拒绝和重新发出指令的系统

    公开(公告)号:US06654876B1

    公开(公告)日:2003-11-25

    申请号:US09434875

    申请日:1999-11-04

    IPC分类号: G06F930

    摘要: A method, processor, and data processing system implementing a delayed reject mechanism are disclosed. The processor includes an issue unit suitable for issuing an instruction in a first cycle and a load store unit (LSU). The LSU includes an extend reject calculator circuit configured to receive a set of completion information signals and generate a delay value based thereon. The LSU is adapted to determine whether to reject the instruction in a determination cycle. The number of cycles between the first cycle and the determination cycle is a function of the delay value such that reject timing is variable with respect to the first cycle. In one embodiment, the processor is further configured to reissue the instruction after the determination cycle if the instruction was rejected in the determination cycle. The delay value is conveyed via a 2-bit bus in one embodiment. The 2 bit bus permits delaying the determination cycle from 0 to 3 cycles after a finish cycle. In one embodiment, the number of cycles between the first cycle and the determination cycle includes the number of cycles required to travel a pipeline of the microprocessor plus the number of cycles indicated by the delay value.

    摘要翻译: 公开了一种实现延迟拒绝机制的方法,处理器和数据处理系统。 该处理器包括适于在第一周期中发出指令的发布单元和负载存储单元(LSU)。 LSU包括扩展拒绝计算器电路,被配置为接收一组完成信息信号并基于此生成延迟值。 LSU适于确定是否在确定周期中拒绝该指令。 第一周期和确定周期之间的周期数是延迟值的函数,使得拒绝定时相对于第一周期是可变的。 在一个实施例中,处理器还被配置为在确定周期之后重新发出指令,如果指令在确定周期中被拒绝。 在一个实施例中,延迟值通过2位总线传送。 2位总线允许在完成循环后将判定周期从0到3个周期延迟。 在一个实施例中,第一周期和确定周期之间的循环次数包括行进微处理器的流水线所需的循环次数加上由延迟值指示的周期数。

    Content addressable storage apparatus and register mapper architecture
    46.
    发明授权
    Content addressable storage apparatus and register mapper architecture 有权
    内容可寻址存储设备和寄存器映射器架构

    公开(公告)号:US06480931B1

    公开(公告)日:2002-11-12

    申请号:US09434802

    申请日:1999-11-05

    IPC分类号: G06F1202

    摘要: A non-conventional CAM (content addressable memory) and register mapper organization and circuit implementation is provided which allows simultaneous execution of a large number of CAM searches. All compare circuits are placed outside of the CAM in separate match arrays where the actual comparisons occur. The CAM cell contains only latches to hold the CAM stored bit of data and a multi-port MUX to update the CAM content. The CAM bits are driven to the match arrays for match generation. The structure of the CAM and search engine facilitates implementation of the register mapper as a group of custom arrays. Each array is dedicated to execute a specific function. All of the arrays are aligned and each row of an array is devoted to one register to keep current state, shadow state and controls for that register. In an exemplary embodiment, eight custom arrays are used to execute various functions of the register mapper.

    摘要翻译: 提供非常规CAM(内容可寻址存储器)和寄存器映射器组织和电路实现,其允许同时执行大量CAM搜索。 所有比较电路都放置在CAM外部,在实际比较发生的单独的匹配数组中。 CAM单元仅包含用于保存CAM存储的数据位的锁存器和用于更新CAM内容的多端口MUX。 CAM位被驱动到匹配数组以进行匹配生成。 CAM和搜索引擎的结构有助于将寄存器映射器实现为一组自定义阵列。 每个阵列专用于执行特定功能。 所有数组都对齐,数组的每一行都用于一个寄存器,以保持该寄存器的当前状态,阴影状态和控制。 在示例性实施例中,八个定制阵列用于执行寄存器映射器的各种功能。

    Compressed string and multiple generation engine
    47.
    发明授权
    Compressed string and multiple generation engine 失效
    压缩字符串和多代引擎

    公开(公告)号:US06442675B1

    公开(公告)日:2002-08-27

    申请号:US09363464

    申请日:1999-07-29

    IPC分类号: G06F938

    CPC分类号: G06F9/30043 G06F9/3017

    摘要: A generalized, programmable dataflow state-machine is provided to receive information about a particular string instruction. The string instruction is parsed into all the operations contained in the string instruction. The operations that make up the string instruction are routed to parallel functional units and executed. The state-machine manipulates the size of the operations in the string instruction and whether or not the instructions need to be generated.

    摘要翻译: 提供通用的可编程数据流状态机以接收关于特定字符串指令的信息。 字符串指令被解析为字符串指令中包含的所有操作。 构成字符串指令的操作被路由到并行功能单元并执行。 状态机在字符串指令中操作操作的大小以及是否需要生成指令。

    Efficient firm consistency support mechanisms in an out-of-order
execution superscaler multiprocessor
    48.
    发明授权
    Efficient firm consistency support mechanisms in an out-of-order execution superscaler multiprocessor 失效
    无序执行超标量多处理器中有效的企业一致性支持机制

    公开(公告)号:US5699538A

    公开(公告)日:1997-12-16

    申请号:US352467

    申请日:1994-12-09

    IPC分类号: G06F9/38 G06F12/08 G06F9/312

    CPC分类号: G06F9/383 G06F9/3834

    摘要: Two processor controls for supporting efficient Firm Consistency while allowing out-of-order execution of Load instructions is provided. The Touch control operates when the processor stores a subsequent Store in a pending Store buffer while awaiting any outstanding Loads or Stores. The efficiency of the pending Store is improved by issuing a Touch of the data which pre-loads the line of data in the cache that is the subject of the store. The processor can complete out-of-order execution of a subsequently issued Load relative to a prior Load, but only to its finished state. The subsequently issued Load is not allowed to complete until the prior Load is completed. The Finished Load Cancellation control ensures that Firm Consistency is maintained by canceling any finished Loads, and subsequent instructions, when the subject of the Load is the same as an invalidation request from a multiprocessor.

    摘要翻译: 提供了两个处理器控制,用于支持高效的一致性,同时允许负载指令的无序执行。 当等待任何未完成的负载或商店时,处理器将后续存储器存储在待处理的存储缓冲区中时,Touch控件将会起作用。 通过发布预先加载作为商店主题的缓存中的数据行的数据,可以改善待处理存储的效率。 处理器可以完成相对于先前负载的随后发出的负载的无序执行,但只能完成其完成状态。 随后发出的装载将不允许完成,直到先前的装载完成。 当负载的主体与来自多处理器的无效请求相同时,完成负载消除控制确保通过取消任何完成的负载和后续指令来维持公司一致性。

    Transactional memory system which employs thread assists using address history tables
    49.
    发明授权
    Transactional memory system which employs thread assists using address history tables 失效
    事务性存储系统采用线程协助使用地址历史表

    公开(公告)号:US08117403B2

    公开(公告)日:2012-02-14

    申请号:US11928758

    申请日:2007-10-30

    IPC分类号: G06F12/00 G06F13/00

    CPC分类号: G06F12/0842 G06F12/0817

    摘要: A computing system uses specialized “Set Associative Transaction Tables” and additional “Summary Transaction Tables” to speed the processing of common transactional memory conflict cases and those which employ assist threads using an Address History Table and processes memory transactions with a Transaction Table in memory for parallel processing of multiple threads of execution by support of which an application need not be aware. Special instructions may mark the boundaries of a transaction and identify memory locations applicable to a transaction. A ‘private to transaction’ (PTRAN) tag, directly addressable as part of the main data storage memory location, enables a quick detection of potential conflicts with other transactions that are concurrently executing on another thread of said computing system. The tag indicates whether (or not) a data entry in memory is part of a speculative memory state of an uncommitted transaction that is currently active in the system.

    摘要翻译: 计算系统使用专门的“集合关联事务表”和附加的“摘要事务表”来加速常见的事务性内存冲突情况的处理,以及使用地址历史表使用辅助线程的处理,并使用内存中的事务表处理内存事务 通过支持应用程序不需要知道的并行处理多个执行线程。 特殊说明可能标记交易的边界,并确定适用于交易的记忆位置。 作为主数据存储存储器位置的一部分可直接寻址的“私有交易”(PTRAN)标签使得能够快速检测与所述计算系统的另一个线程上并发执行的其他事务的潜在冲突。 标记表示(或不)内存中的数据条目是系统中当前处于活动状态的未提交事务的推测性存储器状态的一部分。