High performance multiprocessor system with modified-unsolicited cache state
    91.
    发明授权
    High performance multiprocessor system with modified-unsolicited cache state 失效
    具有修改的主动缓存状态的高性能多处理器系统

    公开(公告)号:US06321306B1

    公开(公告)日:2001-11-20

    申请号:US09437179

    申请日:1999-11-09

    IPC分类号: G06F1212

    CPC分类号: G06F12/0831

    摘要: A novel cache coherency protocol provides a modified-unsolicited (MU) cache state to indicate that a value held in a cache line has been modified (i.e., is not currently consistent with system memory), but was modified by another processing unit, not by the processing unit associated with the cache that currently contains the value in the MU state, and that the value is held exclusive of any other horizontally adjacent caches. Because the value is exclusively held, it may be modified in that cache without the necessity of issuing a bus transaction to other horizontal caches in the memory hierarchy. The MU state may be applied as a result of a snoop response to a read request. The read request can include a flag to indicate that the requesting cache is capable of utilizing the MU state. Alternatively, a flag may be provided with intervention data to indicate that the requesting cache should utilize the modified-unsolicited state.

    摘要翻译: 一种新颖的高速缓存一致性协议提供修改的非请求(MU)高速缓存状态,以指示保持在高速缓存行中的值已被修改(即,当前不符合系统存储器),但是被另一个处理单元修改,而不是由 与当前包含MU状态的值的高速缓存相关联的处理单元,并且该值被保持为任何其他水平相邻的高速缓存。 因为该值是唯一保留的,所以可以在该高速缓存中修改该值,而不需要向存储器层级中的其他水平高速缓存发出总线事务。 作为对读取请求的窥探响应的结果,可以应用MU状态。 读取请求可以包括用于指示请求的高速缓存能够利用MU状态的标志。 或者,可以向标记提供干预数据,以指示请求的高速缓存应该利用修改的未经请求的状态。

    Multiprocessor system bus with combined snoop responses explicitly cancelling master allocation of read data
    92.
    发明授权
    Multiprocessor system bus with combined snoop responses explicitly cancelling master allocation of read data 失效
    具有组合侦听响应的多处理器系统总线显式地取消读取数据的主分配

    公开(公告)号:US06321305B1

    公开(公告)日:2001-11-20

    申请号:US09368230

    申请日:1999-08-04

    IPC分类号: G06F1200

    摘要: In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs the storage device initiating the combined operation not to allocate storage for the target of the data access. Instead, the target of the data access may be passed directly to an in-line processor core without storage, may be stored in a horizontal storage device, or may be stored in an in-line, noninclusive, lower level storage device. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.

    摘要翻译: 组合响应逻辑在取消组合操作包括与丢弃相关的数据访问的部署时,明确地指示存储设备启动组合操作,而不为数据访问的目标分配存储。 相反,数据访问的目标可以直接传递到没有存储的在线处理器核心,可以被存储在水平存储设备中,或者可以被存储在一个在线的,独立的,低级的存储设备中。 取消投票,从而延迟与将丢弃的受害者写入系统内存相关的任何延迟,同时最大限度地利用可用存储在数据访问延迟中具有可接受的折中。

    Data processing system having demand based write through cache with
enforced ordering
    93.
    发明授权
    Data processing system having demand based write through cache with enforced ordering 失效
    数据处理系统具有基于需求的写入通过缓存执行排序

    公开(公告)号:US5796979A

    公开(公告)日:1998-08-18

    申请号:US730994

    申请日:1996-10-16

    IPC分类号: G06F12/08 G06F13/12

    摘要: A data processing system includes a processor, a system memory, one or more input/output channel controllers (IOCC), and a system bus connecting the processor, the memory and the IOCCs together for communicating instructions, address and data between the various elements of a system. The IOCC includes a paged cache storage having a number of lines wherein each line of the page may be, for example, 32 bytes. Each page in the cache also has several attribute bits for that page including the so called WIM and attribute bits. The W bit is for controlling write through operations; the I bit controls cache inhibit; and the M bit controls memory coherency. Since the IOCC is unaware of these page table attribute bits for the cache lines being DMAed to system memory, IOCC must maintain memory consistency and cache coherency without sacrificing performance. For DMA write data to system memory, new cache attributes called global, cachable and demand based write through are created. Individual writes within a cache line are gathered by the IOCC and only written to system memory when the I/O bus master accesses a different cache line or relinquishes the I/O bus.

    摘要翻译: 数据处理系统包括处理器,系统存储器,一个或多个输入/输出通道控制器(IOCC)以及将处理器,存储器和IOCC连接在一起的系统总线,用于在各种元件之间传送指令,地址和数据 一个系统。 IOCC包括具有多行的分页缓存存储器,其中页面的每行可以是例如32字节。 缓存中的每个页面还具有该页面的几个属性位,包括所谓的WIM和属性位。 W位用于控制写操作; I位控制缓存抑制; M位控制存储器一致性。 由于IOCC不知道将这些页表属性位用于高速缓存行被DMA映射到系统内存,因此IOCC必须保持内存一致性和高速缓存一致性,而不会牺牲性能。 对于将DMA写入数据到系统内存,创建了称为全局,可高速缓存和基于需求的写入的新缓存属性。 高速缓存行中的单独写入由IOCC收集,只有当I / O总线主机访问不同的高速缓存行或放弃I / O总线时才写入系统存储器。

    Speculative execution of instructions and processes before completion of preceding barrier operations
    94.
    发明授权
    Speculative execution of instructions and processes before completion of preceding barrier operations 失效
    完成前面的障碍操作之前,对指令和过程的推测执行

    公开(公告)号:US06880073B2

    公开(公告)日:2005-04-12

    申请号:US09753053

    申请日:2000-12-28

    IPC分类号: G06F9/30 G06F9/38 G06F9/00

    摘要: Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued prior to the return of the sync acknowledgment. Data returned is immediately forwarded to the processor's execution units. The returned data and results of subsequent operations are held temporarily in rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative. When a barrier acknowledge is received by the BOP controller, the flag(s) of the corresponding rename register(s) are reset.

    摘要翻译: 描述了提供完整的多处理器推测的数据处理系统和处理器,在系统总线上的屏障操作完成之前,推测性地执行指令序列中的屏障操作之后的所有指令。 处理器包括具有屏障操作(BOP)控制器的加载/存储单元(LSU),其允许在指令序列中的同步之后的加载指令在返回同步确认之前被推测地发出。 返回的数据立即转发到处理器的执行单元。 返回的数据和后续操作的结果暂时保存在重命名寄存器中。 在相应的重命名寄存器中设置多处理器推测标志,以指示该值为“屏障”推测。 当BOP控制器接收到屏障确认时,相应的重命名寄存器的标志被重置。

    Imprecise snooping based invalidation mechanism
    95.
    发明授权
    Imprecise snooping based invalidation mechanism 失效
    不精确的基于窥探的无效机制

    公开(公告)号:US06801984B2

    公开(公告)日:2004-10-05

    申请号:US09895119

    申请日:2001-06-29

    IPC分类号: G06F1208

    CPC分类号: G06F12/0831

    摘要: A method, system, and processor cache configuration that enables efficient retrieval of valid data in response to an invalidate cache miss at a local processor cache. A cache directory is provided a set of directional bits in addition to the coherency state bits and the address tag. The directional bits provide information that includes a processor cache identification (ID) and routing method. The processor cache ID indicates which processor's operation resulted in the cache line of the local processor changing to the invalidate (I) coherency state. The routing method indicates what transmission method to utilize to forward the cache line, from among a local system bus or a switch or broadcast mechanism. Processor/Cache directory logic provide responses to requests depending on the values of the directional bits.

    摘要翻译: 一种方法,系统和处理器高速缓存配置,其能够响应于在本地处理器高速缓存处的无效高速缓存未命中而有效地检索有效数据。 除了一致性状态位和地址标签之外,向缓存目录提供一组方向位。 方向位提供包括处理器缓存标识(ID)和路由方法的信息。 处理器缓存ID指示哪个处理器的操作导致本地处理器的高速缓存行变为无效(I)一致性状态。 该路由方法指示用于从本地系统总线或交换机或广播机制中转发高速缓存行的什么传输方法。 处理器/缓存目录逻辑根据定向位的值提供对请求的响应。

    System and method for providing multiprocessor speculation within a speculative branch path
    96.
    发明授权
    System and method for providing multiprocessor speculation within a speculative branch path 失效
    在推测性分支路径中提供多处理器推测的系统和方法

    公开(公告)号:US06728873B1

    公开(公告)日:2004-04-27

    申请号:US09588507

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.

    摘要翻译: 公开了一种处理器内的操作方法,其增强了推测性分支处理。 推测执行路径包含指令序列,其中包含跟随加载指令的障碍指令。 当与障碍指令相关联的障碍操作正在等待时,与加载指令相关联的加载请求被推测地发布到存储器。 当推测性地发出加载请求时设置标志,并且当接收到用于屏障操作的确认时,重置该标志。 在接收到确认之后,由推测发出的加载请求返回的数据被暂时保存并转发到数据处理系统的寄存器或执行单元。 当推测性执行路径被确定为不正确时,所有处理结果(包括由推测发出的加载指令返回的数据)被丢弃。

    Mechanism for folding storage barrier operations in a multiprocessor system
    97.
    发明授权
    Mechanism for folding storage barrier operations in a multiprocessor system 失效
    在多处理器系统中折叠存储屏障操作的机制

    公开(公告)号:US06725340B1

    公开(公告)日:2004-04-20

    申请号:US09588509

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a processor that reduces barrier operations during instruction processing. An instruction sequence includes a first barrier instruction and a second barrier instruction with a store instruction in between the first and second barrier instructions. A store request associated with the store instruction is issued prior to a barrier operation associated with the first barrier instruction. A determination is made of when the store request completes before the first barrier instruction has issued. In response, only a single barrier operation is issued for both the first and second barrier instructions. The single barrier operation is issued after the store request has been issued and at the time the second barrier operation is scheduled to be issued.

    摘要翻译: 公开了一种在指令处理期间减少屏障操作的处理器。 指令序列包括在第一和第二屏障指令之间具有存储指令的第一屏障指令和第二屏障指令。 在与第一屏障指令相关联的屏障操作之前发出与存储指令相关联的存储请求。 确定存储请求何时在第一个屏障指令发出之前完成。 作为响应,仅为第一和第二屏障指令发出单个屏障操作。 单个屏障操作在存储请求已经被发出之后并且在第二屏障操作被安排发布的时候发出。

    Multiprocessor computer system with sectored cache line system bus protocol mechanism
    98.
    发明授权
    Multiprocessor computer system with sectored cache line system bus protocol mechanism 失效
    多处理器计算机系统采用高速缓存线路系统总线协议机制

    公开(公告)号:US06484241B2

    公开(公告)日:2002-11-19

    申请号:US09752862

    申请日:2000-12-28

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A method of maintaining coherency in a multiprocessor computer system wherein each processing unit's cache has sectored cache lines. A first cache coherency state is assigned to one of the sectors of a particular cache line, and a second cache coherency state, different from the first cache coherency state, is assigned to the overall cache line while maintaining the first cache coherency state for the first sector. The first cache coherency state may provide an indication that the first sector contains a valid value which is not shared with any other cache (i.e., an exclusive or modified state), and the second cache coherency state may provide an indication that at least one of the sectors in the cache line contains a valid value which is shared with at least one other cache (a shared, recently-read, or tagged state). Other coherency states may be applied to other sectors in the same cache line. Partial intervention may be achieved by issuing a request to retrieve an entire cache line, and sourcing only a first sector of the cache line in response to the request. A second sector of the same cache line may be sourced from a third cache. Other sectors may also be sourced from a system memory device of the computer system as well. Appropriate system bus codes are utilized to transmit cache operations to the system bus and indicate which sectors of the cache line are targets of the cache operation.

    摘要翻译: 一种在多处理器计算机系统中维持一致性的方法,其中每个处理单元的高速缓冲存储器具有高速缓存行。 第一高速缓存一致性状态被分配给特定高速缓存行的一个扇区,并且与第一高速缓存一致性状态不同的第二高速缓存一致性状态被分配给总高速缓存行,同时保持第一高速缓存一致性状态 部门。 第一高速缓存一致性状态可以提供第一扇区包含不与任何其它高速缓存共享的有效值(即,排他或修改状态)的指示,并且第二高速缓存一致性状态可以提供以下指示: 高速缓存行中的扇区包含与至少一个其他高速缓存(共享,最近读取或标记状态)共享的有效值。 其他一致性状态可以应用于同一高速缓存行中的其他扇区。 部分干预可以通过发出检索整个高速缓存线的请求来实现,并且仅响应于该请求仅提供高速缓存行的第一扇区。 相同高速缓存行的第二扇区可以来自第三高速缓存。 其他扇区也可以来自计算机系统的系统存储器设备。 利用适当的系统总线代码将高速缓存操作发送到系统总线,并指示高速缓存行的哪些扇区是高速缓存操作的目标。

    Method for upper level cache victim selection management by a lower level cache
    99.
    发明授权
    Method for upper level cache victim selection management by a lower level cache 失效
    低级缓存的上级缓存受害者选择管理方法

    公开(公告)号:US06446166B1

    公开(公告)日:2002-09-03

    申请号:US09340073

    申请日:1999-06-25

    IPC分类号: G06F1214

    摘要: A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.

    摘要翻译: 一种改进计算机系统的存储器访问的方法,通过将请求发送到较低级别的存储子系统以及由请求处理器对与请求的信息的预期用途有关的关联信息而不使用高级别的负载队列来进行发送。 将所请求的信息与相关联的使用信息一起返回到处理器允许立即放置信息而不使用重新加载缓冲器。 使用与缓存负载总线分离(并具有较小粒度)的寄存器负载总线返回信息。 然后可能不精确地重新加载上级(L1)高速缓存(高级缓存也可以不精确地用存储指令重新加载)。 低级(L​​2)缓存可以监视L1和L2高速缓存活动,其可用于在L1高速缓存中选择受害者缓存块(基于附加的L2信息),或者选择L2缓存中的受害缓存块( 基于附加的L1信息)。 L1目录的L2控制也允许解决某些侦听请求,而无需等待L1确认。 本发明可以应用于例如指令,操作数数据和翻译高速缓存。

    Layered local cache mechanism with split register load bus and cache load bus
    100.
    发明授权
    Layered local cache mechanism with split register load bus and cache load bus 有权
    分层本地缓存机制,具有分裂寄存器负载总线和缓存负载总线

    公开(公告)号:US06405285B1

    公开(公告)日:2002-06-11

    申请号:US09340076

    申请日:1999-06-25

    IPC分类号: G06F1200

    摘要: A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.

    摘要翻译: 一种改进计算机系统的存储器访问的方法,通过将请求发送到较低级别的存储子系统以及由请求处理器对与请求的信息的预期用途有关的关联信息而不使用高级别的负载队列来进行发送。 将所请求的信息与相关联的使用信息一起返回到处理器允许立即放置信息而不使用重新加载缓冲器。 使用与缓存负载总线分离(并具有较小粒度)的寄存器负载总线返回信息。 然后可能不精确地重新加载上级(L1)高速缓存(高级缓存也可以不精确地用存储指令重新加载)。 低级(L​​2)缓存可以监视L1和L2高速缓存活动,其可用于在L1高速缓存中选择受害者缓存块(基于附加的L2信息),或者选择L2缓存中的受害缓存块( 基于附加的L1信息)。 L1目录的L2控制也允许解决某些侦听请求,而无需等待L1确认。 本发明可以应用于例如指令,操作数数据和翻译高速缓存。