Processor and method for managing execution of an instruction which
determine subsequent to dispatch if an instruction is subject to
serialization
    31.
    发明授权
    Processor and method for managing execution of an instruction which determine subsequent to dispatch if an instruction is subject to serialization 失效
    用于管理指令的执行的处理器和方法,所述指令确定在调度指令是否进行序列化之后

    公开(公告)号:US5678016A

    公开(公告)日:1997-10-14

    申请号:US512741

    申请日:1995-08-08

    IPC分类号: G06F9/312 G06F9/38

    摘要: A method and apparatus are disclosed for managing the execution of a floating-point store instruction within a data processing system including a memory and a superscalar processor having a number of floating-point registers (FPRs). According to the present invention, multiple instructions are dispatched for execution by the processor, including a floating-point store instruction having as an operand the content of a particular FPR. A determination is made whether the particular FPR is a destination register for results of a second instruction which precedes the store instruction in program order. If so, a determination is made whether the second instruction must complete before subsequent instructions can be successfully dispatched. In response to a determination that the second instruction must be completed prior to successfully dispatching subsequent instructions, the floating-point instruction is cancelled and redispatched after the completion of the second instruction. In response to a determination that the second instruction need not be completed prior to successfully dispatching subsequent instructions, execution of the floating-point store instruction is initiated by computing the destination address within memory into which the operand of the floating-point store instruction is to be stored, thereby minimizing the delay in executing a floating-point store instruction.

    摘要翻译: 公开了一种用于管理包括具有多个浮点寄存器(FPR)的存储器和超标量处理器的数据处理系统内的浮点存储指令的执行的方法和装置。 根据本发明,调度多个指令以供处理器执行,包括具有作为特定FPR的内容的操作数的浮点存储指令。 确定特定FPR是否是用于以程序顺序在存储指令之前的第二指令的结果的目的地寄存器。 如果是,则确定第二条指令是否必须在后续指令可以成功发送之前完成。 响应于在成功发送后续指令之前必须完成第二条指令的确定,在完成第二条指令之后,浮点指令被取消并重新分配。 响应于在成功发送后续指令之前不需要完成第二指令的确定,通过计算浮点存储指令的操作数所在的存储器内的目标地址来启动浮点存储指令的执行 被存储,从而最小化执行浮点存储指令的延迟。

    Optimal deallocation of instructions from a unified pick queue
    32.
    发明授权
    Optimal deallocation of instructions from a unified pick queue 有权
    从统一的拣选队列中优化解除指令

    公开(公告)号:US09286075B2

    公开(公告)日:2016-03-15

    申请号:US12571200

    申请日:2009-09-30

    IPC分类号: G06F9/38

    摘要: Systems and methods for efficient out-of-order dynamic deallocation of entries within a shared storage resource in a processor. A processor comprises a unified pick queue that includes an array configured to dynamically allocate any entry of a plurality of entries for a decoded and renamed instruction. This instruction may correspond to any available active threads supported by the processor. The processor includes circuitry configured to determine whether an instruction corresponding to an allocated entry of the plurality of entries is dependent on a speculative instruction and whether the instruction has a fixed instruction execution latency. In response to determining the instruction is not dependent on a speculative instruction, the instruction has a fixed instruction execution latency, and said latency has transpired, the circuitry may deallocate the instruction from the allocated entry.

    摘要翻译: 用于处理器中共享存储资源内的条目的有效无序动态释放的系统和方法。 处理器包括统一选择队列,其包括被配置为动态分配用于解码和重命名指令的多个条目的任何条目的阵列。 该指令可以对应于处理器支持的任何可用的活动线程。 所述处理器包括被配置为确定与所述多个条目中所分配的条目相对应的指令是否取决于推测指令以及所述指令是否具有固定指令执行等待时间的电路。 响应于确定指令不依赖于推测性指令,指令具有固定的指令执行延迟,并且所述等待时间已经发生,电路可能从分配的条目释放指令。

    Processor operating mode for mitigating dependency conditions between instructions having different operand sizes
    33.
    发明授权
    Processor operating mode for mitigating dependency conditions between instructions having different operand sizes 有权
    用于缓解具有不同操作数大小的指令之间的依赖条件的处理器操作模式

    公开(公告)号:US08504805B2

    公开(公告)日:2013-08-06

    申请号:US12428464

    申请日:2009-04-22

    IPC分类号: G06F7/483

    摘要: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes). This operating mode may be employed on a per-thread basis.

    摘要翻译: 公开了用于减轻指令组之间依赖性的各种技术。 在一个实施例中,这种依赖性包括“恶双”条件,其中第一浮点指令具有作为目的地的逻辑浮点寄存器的第一部分(例如,单精度写入),并且其中第二浮点指令 后续浮点指令作为源的相同逻辑浮点寄存器的第一部分和第二部分(例如,双精度读取)。 所公开的技术可以适用于实现寄存器重命名的多线程处理器。 在一个实施例中,处理器可以进入操作模式,在该操作模式中,恶意孪生“生产者”(例如,单精度写入)的检测导致指令序列被修改以破坏潜在依赖性。 指令序列的修改可以继续,直到达到一个或多个退出标准(例如,提交预定数量的单精度写入)。 该操作模式可以在每个线程的基础上使用。

    APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS
    34.
    发明申请
    APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS 有权
    本地操作的装置和方法用于拼接指令

    公开(公告)号:US20110087895A1

    公开(公告)日:2011-04-14

    申请号:US12575832

    申请日:2009-10-08

    IPC分类号: G06F21/00 G06F9/30 G06F9/312

    摘要: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor.

    摘要翻译: 处理器可以包括被配置为发出用于执行的指令的硬件指令获取单元和被配置为接收用于执行的指令的硬件功能单元,其中所述指令包括加密指令和非加密指令。 功能单元可以包括被配置为执行具有相应的加密执行等待时间的加密指令的密码执行流水线,以及配置成执行非加密指令的非加密执行流水线,该非加密执行流水线的长度大于 加密执行延迟。 功能单元还可以包括局部旁路网络,其被配置为将由密码执行流水线产生的结果旁路到在密码执行流水线内执行的依赖密码指令,使得依赖密码指令序列内的每个指令都可以用密码执行等待时间执行, 并且其中加密执行流水线的结果不被旁路到处理器内的任何其他功能单元。

    MULTIPORTED REGISTER FILE FOR MULTITHREADED PROCESSORS AND PROCESSORS EMPLOYING REGISTER WINDOWS
    35.
    发明申请
    MULTIPORTED REGISTER FILE FOR MULTITHREADED PROCESSORS AND PROCESSORS EMPLOYING REGISTER WINDOWS 有权
    多用途处理器和使用注册窗口的处理器的多个寄存器文件

    公开(公告)号:US20110078414A1

    公开(公告)日:2011-03-31

    申请号:US12570682

    申请日:2009-09-30

    IPC分类号: G06F9/30

    摘要: A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.

    摘要翻译: 处理器包括:指令获取单元,被配置为发出用于执行的指令,其中从多个线程中选择指令,其中每个给定指令具有对应的线程标识符,并且其中至少一些指令经由寄存器指定操作数 身份标识。 寄存器文件存储指令可用的操作数,并且可以包括几个存储体,每个存储体对应于寄存器标识符,并且包括与多个线程对应的多个条目,其中条目被配置为存储数据值。 响应于接收到针对给定线程标识符读取特定寄存器标识符的请求,寄存器文件可以被配置为对给定的线程标识符进行解码以从对应于给定线程标识符的存储体检索条目。 寄存器文件还可以从检索到的条目中选择与要输出的特定寄存器标识符对应的数据值。

    DYNAMIC TAG ALLOCATION IN A MULTITHREADED OUT-OF-ORDER PROCESSOR
    36.
    发明申请
    DYNAMIC TAG ALLOCATION IN A MULTITHREADED OUT-OF-ORDER PROCESSOR 有权
    动态标签分配在一个多边进阶的处理器

    公开(公告)号:US20100333098A1

    公开(公告)日:2010-12-30

    申请号:US12494532

    申请日:2009-06-30

    IPC分类号: G06F9/46 G06F12/08

    摘要: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.

    摘要翻译: 公开了用于动态分配指令标签和使用这些标签的各种技术。 这些技术可能适用于支持无序执行的处理器和支持多线程的体系结构。 可以从可用标签值池中分配一组指令。 标签值可用于确定相对于线程中的其他指令的一组指令的程序顺序。 在指示组(或将要))提交之后,可以释放标签值,以便可以在第二组指令上重新使用。 标记值在线程之间动态分配; 因此,特定标签值或标签值的范围不专用于特定线程。

    METHODS AND MECHANISMS TO SUPPORT MULTIPLE FEATURES FOR A NUMBER OF OPCODES
    37.
    发明申请
    METHODS AND MECHANISMS TO SUPPORT MULTIPLE FEATURES FOR A NUMBER OF OPCODES 有权
    支持多个操作系统的多种功能的方法和机制

    公开(公告)号:US20100257338A1

    公开(公告)日:2010-10-07

    申请号:US12420054

    申请日:2009-04-07

    IPC分类号: G06F9/30 G06F9/00

    摘要: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.

    摘要翻译: 用于指令集的操作码的多个特征的有效指令支持的系统和方法。 处理器检测计算机程序的获取指令包括对应于多个功能的操作码。 每个功能对应于不同类型的操作。 处理器确定接收到的指令对应于计算机程序所请求的特征,例如加密算法。 确定是否存在该功能的硬件支持。 如果该功能存在硬件支持,则该指令由硬件在片上执行。 否则,软件将执行与该指令相对应的操作。

    MEMORY WITH WRITE PORT CONFIGURED FOR DOUBLE PUMP WRITE
    38.
    发明申请
    MEMORY WITH WRITE PORT CONFIGURED FOR DOUBLE PUMP WRITE 有权
    存储器配有写入端口用于双PU写入

    公开(公告)号:US20090231935A1

    公开(公告)日:2009-09-17

    申请号:US12049798

    申请日:2008-03-17

    IPC分类号: G11C7/10 G11C7/22

    摘要: A memory with a write port configured for double-pump writes. The memory includes a first and second memory locations each having one or more bit cells, and one or more bit lines each coupled to corresponding ones of the bit cells. A write port is coupled to each of the bit lines. Selection circuitry, responsive to a first clock edge, latches first data from a first data path through the write port, and responsive to a second clock edge, latches second data from a second data path through the write port. A first pulse is generated during a first phase of the clock signal to cause writing of the first data into the first memory location. A second pulse is generated during a second phase of the clock signal to cause writing of the second data into the second memory location.

    摘要翻译: 具有配置为双泵写入的写入端口的存储器。 存储器包括每个具有一个或多个位单元的第一和第二存储器单元,以及每个耦合到相应的位单元的一个或多个位线。 写端口耦合到每个位线。 响应于第一时钟沿的选择电路锁存来自第一数据路径的第一数据通过写入端口,并且响应于第二时钟沿,通过写入端口锁存来自第二数据路径的第二数据。 在时钟信号的第一阶段期间产生第一脉冲,以使第一数据写入第一存储器位置。 在时钟信号的第二阶段期间产生第二脉冲,以使第二数据写入第二存储器位置。

    Efficient utilization of a store buffer using counters
    39.
    发明授权
    Efficient utilization of a store buffer using counters 有权
    使用计数器高效利用存储缓冲区

    公开(公告)号:US07519796B1

    公开(公告)日:2009-04-14

    申请号:US10881935

    申请日:2004-06-30

    IPC分类号: G06F9/00

    摘要: An apparatus and method for efficiently managing store buffer operations is described in connection with a multithreaded multiprocessor chip. A CMT processor keeps track of stores by maintaining two store counters in the instruction fetch unit (IFU). A speculative store counter in the IFU tracks stores in flight to the store buffer as well as stores already in the store buffer. A committed store counter in the IFU tracks the number of stores actually in the store buffer. The store buffer provides allocate and deallocate signals to accurately maintain the committed store counter. The IFU stops issuing stores to the store buffer once the speculative counter has reached a threshold value. Upon a flush, the IFU sets the speculative counter equal to the committed store counter. In this way, an efficient feedback mechanism is provided for preventing store buffer overflow that minimizes the store buffer size, operations time and power usage.

    摘要翻译: 结合多线程多处理器芯片描述用于有效管理存储缓冲器操作的装置和方法。 CMT处理器通过在指令获取单元(IFU)中维护两个存储计数器来跟踪存储。 IFU中的推测性商店计数器跟踪到商店缓冲区的商店,并且已经存储在商店缓冲区中。 IFU中提供的存储计数器跟踪实际在商店缓冲区中的商店数量。 存储缓冲区提供分配和释放信号以准确地维护提交的存储计数器。 一旦推测计数器达到阈值,IFU将停止向存储缓冲区发出存储。 在刷新时,IFU将推测计数器设置为等于提交的存储计数器。 以这种方式,提供了一种有效的反馈机制,用于防止存储缓冲区溢出,使存储缓冲区大小,操作时间和功率使用最小化。

    Single cycle data movement between general purpose and floating-point registers
    40.
    发明授权
    Single cycle data movement between general purpose and floating-point registers 有权
    通用和浮点寄存器之间的单周期数据移动

    公开(公告)号:US09304767B2

    公开(公告)日:2016-04-05

    申请号:US12476636

    申请日:2009-06-02

    IPC分类号: G06F9/30 G06F15/00 G06F9/38

    摘要: Systems and methods for providing single cycle movement of data between a floating-point register file (FRF) and a general purpose or integer register file (IRF) of a microprocessor system are provided. The system may include an integer execution unit operative to execute instructions with single cycle latency, a floating-point execution unit, a working register file (WRF), an FRF, and an IRF. To achieve the single cycle movement functionality, the integer execution unit may physically own the WRF, IRF, and FRF, and may monitor and control any dependencies between them. Thus, since the integer execution unit has direct read access to both the IRF and the FRF, data may be moved between the two register files using the single cycle operation of the integer execution unit, without the need to store and load the data from memory.

    摘要翻译: 提供了一种用于在微处理器系统的浮点寄存器堆(FRF)和通用或整数寄存器文件(IRF)之间提供单周期数据移动的系统和方法。 系统可以包括可执行具有单周期延迟的指令的整数执行单元,浮点执行单元,工作寄存器文件(WRF),FRF和IRF。 为了实现单循环移动功能,整数执行单元可以物理拥有WRF,IRF和FRF,并且可以监视和控制它们之间的任何依赖关系。 因此,由于整数执行单元具有对IRF和FRF两者的直接读取访问,所以可以使用整数执行单元的单周期操作在两个寄存器文件之间移动数据,而不需要从存储器存储和加载数据 。