Messaging scheme to maintain cache coherency and conserve system memory bandwidth during a memory read operation in a multiprocessing computer system
    71.
    发明授权
    Messaging scheme to maintain cache coherency and conserve system memory bandwidth during a memory read operation in a multiprocessing computer system 有权
    在多处理计算机系统中的存储器读取操作期间保持高速缓存一致性并节省系统存储器带宽的消息传递方案

    公开(公告)号:US06275905B1

    公开(公告)日:2001-08-14

    申请号:US09217649

    申请日:1998-12-21

    IPC分类号: G06F1200

    CPC分类号: G06F12/0815 G06F12/0813

    摘要: In a multiprocessing computer system, a cache-coherent data transfer scheme that also conserves the system memory bandwidth during a memory read operation is described. A source processing node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. In response to the read command, the target processing node transmits a probe command to all the remaining processing nodes in the computer system regardless of whether one or more of the remaining nodes have a copy of the data cached in their respective cache memories. Probe command causes each node to maintain cache coherency by appropriately changing the state of the cache block containing the requested data and sending respective probe responses to the source node. Probe command also causes the node having an updated copy of the cache block to send the cache block to the source node through a read response. The target node, concurrently with the probe command, initiates a read response transmission to send the requested data to the source node. The node having the modified cached copy containing the requested data transmits a memory cancel response to the target node concurrently with the updated copy of the cache block to the source node. The memory cancel response attempts to prevent the target node from sending to the source node the stale data from the system memory. The memory cancel response also causes the target node to send a target done response to the source node. The source node waits for probe responses, read responses and the target done message prior to sending a source done message to the target node.

    摘要翻译: 在多处理计算机系统中,描述了在存储器读取操作期间也节省系统存储器带宽的高速缓存相干数据传输方案。 源处理节点向目标处理节点发送读命令,以从与目标处理节点相关联的系统存储器中的指定存储器位置读取数据。 响应于读取命令,目标处理节点向计算机系统中的所有其余处理节点发送探测命令,而不管剩余节点中的一个或多个是否具有高速缓存在其各自高速缓冲存储器中的数据的副本。 探测命令使每个节点通过适当地改变包含所请求数据的缓存块的状态并向源节点发送相应的探测响应来维持高速缓存一致性。 探测命令还使得具有高速缓存块的更新副本的节点通过读取响应将高速缓存块发送到源节点。 目标节点与探测命令同时启动读响应传输,以将请求的数据发送到源节点。 具有包含所请求数据的修改的缓存副本的节点将与更新的高速缓存块的副本同时发送到目标节点的存储器取消响应到源节点。 内存取消响应尝试防止目标节点从系统内存向源节点发送陈旧的数据。 内存取消响应还使目标节点向源节点发送目标完成响应。 在将源完成消息发送到目标节点之前,源节点等待探测响应,读取响应和目标完成消息。

    Method and apparatus for minimizing pincount needed by external memory control chip for multiprocessors with limited memory size requirements
    72.
    发明授权
    Method and apparatus for minimizing pincount needed by external memory control chip for multiprocessors with limited memory size requirements 失效
    用于使存储器大小要求有限的多处理器的外部存储器控制芯片所需的针数最小化的方法和装置

    公开(公告)号:US06199153B1

    公开(公告)日:2001-03-06

    申请号:US09099383

    申请日:1998-06-18

    IPC分类号: G06F1200

    CPC分类号: G11C5/066 G11C8/00

    摘要: A computing apparatus has a mode selector configured to select one of a long-bus mode corresponding to a first memory size and a short-bus mode corresponding to a second memory size which is less than the first memory size. An address bus of the computing apparatus is configured to transmit an address consisting of address bits defining the first memory size and a subset of the address bits defining the second memory size. The address bus has N communication lines each configured to transmit one of a first number of bits of the address bits defining the first memory size in the long-bus mode and M of the N communication lines each configured to transmit one of a second number of bits of the address bits defining the second memory size in the short-bus mode, where M is less than N.

    摘要翻译: 计算装置具有模式选择器,其被配置为选择对应于第一存储器大小的长总线模式和对应于小于第一存储器大小的第二存储器大小的短总线模式之一。 计算装置的地址总线被配置为发送由定义第一存储器大小的地址位和定义第二存储器大小的地址位的子集组成的地址。 地址总线具有N个通信线路,每个通信线路被配置为发送长整型模式中定义第一存储器大小的地址位的第一位数目中的一个,并且N个通信线路的M个被配置为发送第二数量的 在短总线模式中定义第二存储器大小的地址位的位,其中M小于N.

    Apparatus and method for providing a cache memory unit with a write
operation utilizing two system clock cycles
    73.
    发明授权
    Apparatus and method for providing a cache memory unit with a write operation utilizing two system clock cycles 失效
    用于使用两个系统时钟周期提供具有写入操作的高速缓冲存储器单元的装置和方法

    公开(公告)号:US4755936A

    公开(公告)日:1988-07-05

    申请号:US823805

    申请日:1986-01-29

    IPC分类号: G06F12/08 G06F13/00

    CPC分类号: G06F12/0855

    摘要: A cache memory unit is disclosed in which, in response to the application of a write command, the write operation is performed in two system clock cycles. During the first clock cycle, the data signal group is stored in a temporary storage unit while a determination is made if the address signal group associated with the data signal group is present in the cache memory unit. When the address signal group is present, the data signal group is stored in the cache memory unit during the next application of a write command to the cache memory unit. If a read command is applied to the cache memory unit involving the data signal group stored in the temporary storage unit, then this data signal group is transferred to the central processing unit in response to the read command. Instead of performing the storage into the cache memory unit as a result of the next write command, the storage of the data signal in the cache memory unit can occur during any free cycle.

    摘要翻译: 公开了一种高速缓冲存储器单元,其中响应于写入命令的应用,在两个系统时钟周期中执行写入操作。 在第一时钟周期期间,数据信号组存储在临时存储单元中,同时确定与数据信号组相关联的地址信号组是否存在于高速缓冲存储器单元中。 当存在地址信号组时,在下一次向高速缓冲存储器单元施加写入命令时,将数据信号组存储在高速缓冲存储器单元中。 如果读取命令被应用于存储在临时存储单元中的数据信号组的高速缓冲存储器单元,则该数据信号组被响应于读取命令传送到中央处理单元。 作为下一个写入命令的结果,代替执行到高速缓冲存储器单元的存储,数据信号在高速缓冲存储器单元中的存储可以在任何空闲周期期间发生。

    Zero cycle move
    74.
    发明授权
    Zero cycle move 有权
    零循环移动

    公开(公告)号:US09575754B2

    公开(公告)日:2017-02-21

    申请号:US13447651

    申请日:2012-04-16

    CPC分类号: G06F9/30032 G06F9/384

    摘要: A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.

    摘要翻译: 一种用于减少数据移动操作的延迟的系统和方法。 处理器内的寄存器重命名单元确定解码的移动指令是否符合零周期移动操作的资格。 如果是这样,则控制逻辑将与移动指令的源操作数相关联的物理寄存器标识分配给移动指令的目的地操作数。 此外,寄存器重命名单元标记给定的移动指令以防止其在处理器管线中继续进行。 特定物理寄存器标识符的进一步维护可以在给定移动指令的提交期间由寄存器重命名单元完成。

    Optimizing register initialization operations
    75.
    发明授权
    Optimizing register initialization operations 有权
    优化寄存器初始化操作

    公开(公告)号:US09430243B2

    公开(公告)日:2016-08-30

    申请号:US13460268

    申请日:2012-04-30

    IPC分类号: G06F9/38

    摘要: A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed.

    摘要翻译: 一种用于有效减少初始化寄存器的延迟的系统和方法。 处理器内的寄存器重命名单元确定在执行流水线阶段之前是否已知解码的给定指令将目标操作数中的特定数值写入。 一个示例是在其目标操作数中写入值0的移动即时指令。 其他示例也可能符合条件。 如果确定,给定的物理寄存器标识符被分配给目的地操作数,其中给定的物理寄存器标识符与特定数值相关联,但是它不与物理寄存器文件中的实际物理寄存器相关联。 给定的指令被标记为防止它进入执行流水线阶段。 当给定的物理寄存器标识符用于读取物理寄存器文件时,不会访问实际的物理寄存器。

    Branch predictor for wide issue, arbitrarily aligned fetch that can cross cache line boundaries
    76.
    发明授权
    Branch predictor for wide issue, arbitrarily aligned fetch that can cross cache line boundaries 有权
    分支预测器,可以跨越高速缓存行边界进行广泛问题,任意对齐的提取

    公开(公告)号:US09201658B2

    公开(公告)日:2015-12-01

    申请号:US13625382

    申请日:2012-09-24

    IPC分类号: G06F9/38 G06F9/30

    摘要: In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, branch prediction values from multiple entries in each table may be read and respective branch prediction values may be combined to form branch predictions for up to M branches in the fetch group.

    摘要翻译: 在一个实施例中,处理器可以被配置为从指令高速缓存(“取出组”)获取N个指令字节,即使获取组跨越高速缓存行边界。 分支预测器可以被配置为在获取组中产生多达M个分支的分支预测,其中M是可以包括在获取组中的最大分支数。 在一个实施例中,可以读取每个表中来自多个条目的分支预测值,并且可以将相应的分支预测值组合以形成在取出组中多达M个分支的分支预测。

    Bandwidth management
    77.
    发明授权
    Bandwidth management 有权
    带宽管理

    公开(公告)号:US08848577B2

    公开(公告)日:2014-09-30

    申请号:US13625416

    申请日:2012-09-24

    IPC分类号: H04L12/28

    CPC分类号: G06F13/1605 Y02D10/14

    摘要: In some embodiments, a system includes a shared, high bandwidth resource (e.g. a memory system), multiple agents configured to communicate with the shared resource, and a communication fabric coupling the multiple agents to the shared resource. The communication fabric may be equipped with limiters configured to limit bandwidth from the various agents based on one or more performance metrics measured with respect to the shared, high bandwidth resource. For example, the performance metrics may include one or more of latency, number of outstanding transactions, resource utilization, etc. The limiters may dynamically modify their limit configurations based on the performance metrics. In an embodiment, the system may include multiple thresholds for the performance metrics, and exceeding a given threshold may include modifying the limiters in the communication fabric. There may be hysteresis implemented in the system as well in some embodiments, to reduce the frequency of transitions between configurations.

    摘要翻译: 在一些实施例中,系统包括共享的高带宽资源(例如,存储器系统),被配置为与共享资源通信的多个代理以及将多个代理耦合到共享资源的通信结构。 通信结构可以配备有限制器,其被配置为基于针对共享的高带宽资源测量的一个或多个性能度量来限制来自各种代理的带宽。 例如,性能度量可以包括延迟,未决事务数量,资源利用等中的一个或多个。限制器可以基于性能度量动态修改其限制配置。 在一个实施例中,系统可以包括用于性能度量的多个阈值,并且超过给定阈值可以包括修改通信结构中的限制器。 在一些实施例中,也可能在系统中实现滞后,以减少配置之间的转换频率。

    Retry mechanism
    78.
    发明授权
    Retry mechanism 有权
    重试机制

    公开(公告)号:US08359414B2

    公开(公告)日:2013-01-22

    申请号:US13165235

    申请日:2011-06-21

    IPC分类号: G06F3/00 G06F15/167

    摘要: An interface unit may comprise a buffer configured to store requests that are to be transmitted on an interconnect and a control unit coupled to the buffer. In one embodiment, the control unit is coupled to receive a retry response from the interconnect during a response phase of a first transaction for a first request stored in the buffer. The control unit is configured to record an identifier supplied on the interconnect with the retry response that identifies a second transaction that is in progress on the interconnect. The control unit is configured to inhibit reinitiation of the first transaction at least until detecting a second transmission of the identifier. In another embodiment, the control unit is configured to assert a retry response during a response phase of a first transaction responsive to a snoop hit of the first transaction on a first request stored in the buffer for which a second transaction is in progress on the interconnect. The control unit is further configured to provide an identifier of the second transaction with the retry response.

    摘要翻译: 接口单元可以包括被配置为存储要在互连上发送的请求的缓冲器和耦合到缓冲器的控制单元。 在一个实施例中,控制单元被耦合以在对于存储在缓冲器中的第一请求的第一事务的响应阶段期间从互连接收重试响应。 控制单元被配置为记录在互连上提供的标识符,该重试响应标识互连上正在进行的第二事务。 控制单元被配置为至少在检测到标识符的第二次传输之前禁止第一事务的重新发起。 在另一个实施例中,控制单元被配置为在第一事务的响应阶段响应第一事务的窥探命中在存储在第二事务在互连上的第二事务的缓冲器中的第一请求时断言重试响应 。 控制单元还被配置为提供具有重试响应的第二事务的标识符。

    Block-based non-transparent cache
    79.
    发明授权
    Block-based non-transparent cache 有权
    基于块的不透明缓存

    公开(公告)号:US08219758B2

    公开(公告)日:2012-07-10

    申请号:US12500810

    申请日:2009-07-10

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: In an embodiment, a non-transparent memory unit is provided which includes a non-transparent memory and a control circuit. The control circuit may manage the non-transparent memory as a set of non-transparent memory blocks. Software executing on one or more processors may request a non-transparent memory block in which to process data. The control circuit may allocate a first block, and may return an address (or other indication) of the allocated block so that the software can access the block. The control circuit may also provide automatic data movement between the non-transparent memory and a main memory system to which the non-transparent memory unit is coupled. For example, the automatic data movement may include filling data from the main memory system to the allocated block, or flushing the data in the allocated block to the main memory system after the processing of the allocated block is complete.

    摘要翻译: 在一个实施例中,提供了一种非透明存储器单元,其包括非透明存储器和控制电路。 控制电路可以将非透明存储器作为一组非透明存储器块进行管理。 在一个或多个处理器上执行的软件可以请求处理数据的非透明存储器块。 控制电路可以分配第一块,并且可以返回所分配的块的地址(或其他指示),使得软件可以访问块。 控制电路还可以在非透明存储器与非透明存储器单元耦合到的主存储器系统之间提供自动数据移动。 例如,自动数据移动可以包括在分配的块的处理完成之后从主存储器系统填充数据到所分配的块,或者将分配的块中的数据刷新到主存储器系统。