Multiprocessor system in which a cache serving as a highest point of coherency is indicated by a snoop response
    1.
    发明授权
    Multiprocessor system in which a cache serving as a highest point of coherency is indicated by a snoop response 失效
    多处理器系统,其中作为最高点的一致性的缓存由窥探响应指示

    公开(公告)号:US06405289B1

    公开(公告)日:2002-06-11

    申请号:US09437196

    申请日:1999-11-09

    IPC分类号: G06F1200

    摘要: A method of maintaining cache coherency, by designating one cache that owns a line as a highest point of coherency (HPC) for a particular memory block, and sending a snoop response from the cache indicating that it is currently the HPC for the memory block and can service a request. The designation may be performed in response to a particular coherency state assigned to the cache line, or based on the setting of a coherency token bit for the cache line. The processing units may be grouped into clusters, while the memory is distributed using memory arrays associated with respective clusters. One memory array is designated as the lowest point of coherency (LPC) for the memory block (i.e., a fixed assignment) while the cache designated as the HPC is dynamic (i.e., changes as different caches gain ownership of the line). An acknowledgement snoop response is sent from the LPC memory array, and a combined response is returned to the requesting device which gives priority to the HPC snoop response over the LPC snoop response.

    摘要翻译: 通过将一个具有一行的高速缓存指定为特定存储器块的最高一致性(HPC),以及从高速缓存指示其当前是存储器块的HPC的高速缓存发送侦听响应的方法来维持高速缓存一致性的方法,以及 可以服务请求。 可以响应于分配给高速缓存行的特定一致性状态,或者基于高速缓存行的相关性令牌位的设置来执行指定。 处理单元可以被分组成群集,而存储器是使用与相应簇相关联的存储器阵列分布的。 一个存储器阵列被指定为存储器块的一致性(LPC)的最低点(即,固定分配),而指定为HPC的缓存是动态的(即,随着不同的高速缓存获得线的所有权而改变)。 从LPC存储器阵列发送确认窥探响应,并且将组合的响应返回给请求设备,该请求设备通过LPC窥探响应优先考虑HPC侦听响应。

    Optimized cache allocation algorithm for multiple speculative requests
    2.
    发明授权
    Optimized cache allocation algorithm for multiple speculative requests 失效
    针对多个推测请求的优化缓存分配算法

    公开(公告)号:US06393528B1

    公开(公告)日:2002-05-21

    申请号:US09345714

    申请日:1999-06-30

    IPC分类号: G06F1200

    CPC分类号: G06F12/0862 G06F12/127

    摘要: A method of operating a computer system is disclosed in which an instruction having an explicit prefetch request is issued directly from an instruction sequence unit to a prefetch unit of a processing unit. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hiearchy and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value.

    摘要翻译: 公开了一种操作计算机系统的方法,其中具有显式预取请求的指令直接从指令序列单元发送到处理单元的预取单元。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器hiearchy请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已被满足,则包含先前预取值中的一个的高速缓存行中的高速缓存行被分配用于接收另一个预取值 。

    Time based mechanism for cached speculative data deallocation
    3.
    发明授权
    Time based mechanism for cached speculative data deallocation 失效
    缓存的推测数据释放的基于时间的机制

    公开(公告)号:US06510494B1

    公开(公告)日:2003-01-21

    申请号:US09345716

    申请日:1999-06-30

    IPC分类号: G06F1208

    摘要: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the pref etch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit.

    摘要翻译: 一种操作计算机系统的处理单元的方法,通过从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用pref蚀刻请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联。

    Method for instruction extensions for a tightly coupled speculative request unit
    4.
    发明授权
    Method for instruction extensions for a tightly coupled speculative request unit 有权
    紧耦合推测请求单元的指令扩展方法

    公开(公告)号:US06421763B1

    公开(公告)日:2002-07-16

    申请号:US09345642

    申请日:1999-06-30

    IPC分类号: G06F1208

    摘要: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value. The prefetch limit of cache usage may be established with a maximum number of sets in a congruence class usable by the requesting processing unit. A flag in a directory of the cache may be set to indicate that the prefetch value was retrieved as the result of a prefetch operation. In the implementation wherein the cache is a multi-level cache, a second flag in the cache directory may be set to indicate that the prefetch value has been sourced to an upstream cache. A cache line containing prefetch data can be automatically invalidated after a preset amount of time has passed since the prefetch value was requested.

    摘要翻译: 一种操作计算机系统的处理单元的方法,通过从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器层次结构请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已经被高速缓存满足,则分配包含较早预取值之一的高速缓存行中的高速缓存行用于接收另一个预取 值。 高速缓存使用的预取限制可以由请求处理单元可用的同余类中的最大数量的集合来建立。 高速缓存目录中的标志可以被设置为指示作为预取操作的结果检索预取值。 在其中缓存是多级高速缓存的实现中,高速缓存目录中的第二标志可以被设置为指示预取值已经被提供给上游高速缓存。 包含预取数据的缓存行可以在从请求预取值开始经过预设的时间后自动失效。

    Extended cache state with prefetched stream ID information
    5.
    发明授权
    Extended cache state with prefetched stream ID information 失效
    扩展缓存状态与预取流ID信息

    公开(公告)号:US06360299B1

    公开(公告)日:2002-03-19

    申请号:US09345644

    申请日:1999-06-30

    IPC分类号: G06F1200

    摘要: A method of operating a computer system is disclosed in which an instruction having an explicit prefetch request is issued directly from an instruction sequence unit to a prefetch unit of a processing unit. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value.

    摘要翻译: 公开了一种操作计算机系统的方法,其中具有显式预取请求的指令直接从指令序列单元发送到处理单元的预取单元。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器层次结构请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已经被高速缓存满足,则分配包含较早预取值之一的高速缓存行中的高速缓存行用于接收另一个预取 值。

    Mechanism for high performance transfer of speculative request data between levels of cache hierarchy
    6.
    发明授权
    Mechanism for high performance transfer of speculative request data between levels of cache hierarchy 失效
    在高速缓存层级之间高速传输推测请求数据的机制

    公开(公告)号:US06532521B1

    公开(公告)日:2003-03-11

    申请号:US09345715

    申请日:1999-06-30

    IPC分类号: G06F1200

    摘要: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value. The prefetch limit of cache usage may be established with a maximum number of sets in a congruence class usable by the requesting processing unit. A flag in a directory of the cache may be set to indicate that the prefetch value was retrieved as the result of a prefetch operation. In the implementation wherein the cache is a multi-level cache, a second flag in the cache directory may be set to indicate that prefetch value has been sourced to an upstream cache. A cache line containing prefetch data can be automatically invalidated after a preset amount of time has passed since the prefetch value was requested.

    摘要翻译: 一种操作计算机系统的处理单元的方法,通过从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器层次结构请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已经被高速缓存满足,则分配包含较早预取值之一的高速缓存行中的高速缓存行用于接收另一个预取 值。 高速缓存使用的预取限制可以由请求处理单元可用的同余类中的最大数量的集合来建立。 高速缓存目录中的标志可以被设置为指示作为预取操作的结果检索预取值。 在其中高速缓存是多级高速缓存的实现中,高速缓存目录中的第二标志可以被设置为指示预取值已经被提供给上游高速缓存。 包含预取数据的缓存行可以在从请求预取值开始经过预设的时间后自动失效。

    Layered speculative request unit with instruction optimized and storage hierarchy optimized partitions
    7.
    发明授权
    Layered speculative request unit with instruction optimized and storage hierarchy optimized partitions 失效
    分层推测请求单元,具有指令优化和存储层次结构优化分区

    公开(公告)号:US06496921B1

    公开(公告)日:2002-12-17

    申请号:US09345643

    申请日:1999-06-30

    IPC分类号: G06F930

    摘要: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster).

    摘要翻译: 一种操作计算机系统的处理单元的方法,通过从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。

    Cache allocation policy based on speculative request history
    8.
    发明授权
    Cache allocation policy based on speculative request history 有权
    基于推测请求历史记录的缓存分配策略

    公开(公告)号:US06421762B1

    公开(公告)日:2002-07-16

    申请号:US09345713

    申请日:1999-06-30

    IPC分类号: G06F1200

    摘要: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value. The prefetch limit of cache usage may be established with a maximum number of sets in a congruence class usable by the requesting processing unit. A flag in a directory of the cache may be set to indicate that the prefetch value was retrieved as the result of a prefetch operation. In the implementation wherein the cache is a multi-level cache, a second flag in the cache directory may be set to indicate that the prefetch value has been sourced to an upstream cache. A cache line containing prefetch data can be automatically invalidated after a preset amount of time has passed since the prefetch value was requested.

    摘要翻译: 一种操作计算机系统的处理单元的方法,通过从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器层次结构请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已经被高速缓存满足,则分配包含较早预取值之一的高速缓存行中的高速缓存行用于接收另一个预取 值。 高速缓存使用的预取限制可以由请求处理单元可用的同余类中的最大数量的集合来建立。 高速缓存目录中的标志可以被设置为指示作为预取操作的结果检索预取值。 在其中缓存是多级高速缓存的实现中,高速缓存目录中的第二标志可以被设置为指示预取值已经被提供给上游高速缓存。 包含预取数据的缓存行可以在从请求预取值开始经过预设的时间后自动失效。

    Multi-node data processing system having a non-hierarchical interconnect architecture
    9.
    发明授权
    Multi-node data processing system having a non-hierarchical interconnect architecture 有权
    具有非分层互连架构的多节点数据处理系统

    公开(公告)号:US06671712B1

    公开(公告)日:2003-12-30

    申请号:US09436898

    申请日:1999-11-09

    IPC分类号: G06F1516

    CPC分类号: G06F13/4217

    摘要: A data processing system includes a plurality of nodes, which each contain at least one agent, and data storage accessible to agents within the nodes. The plurality of nodes are coupled by a non-hierarchical interconnect including multiple non-blocking uni-directional address channels and at least one uni-directional data channel. The agents, which are each coupled to and snoop transactions on all of the plurality of address channels, can only issue transactions on an associated address channel. The uni-directional channels employed by the present non-hierarchical interconnect architecture permit high frequency pumped operation not possible with conventional bi-directional shared system buses. In addition, access latencies to remote (cache or main) memory incurred following local cache misses are greatly reduced as compared with conventional hierarchical systems because of the absence of inter-level (e.g., bus acquisition) communication latency. The non-hierarchical interconnect architecture also permits design flexibility in that the segment of the interconnect within each node can be independently implemented by a set of buses or as a switch, depending upon cost and performance considerations.

    摘要翻译: 数据处理系统包括多个节点,每个节点包含至少一个代理,以及节点内的代理可访问的数据存储。 多个节点通过包括多个非阻塞单向地址信道和至少一个单向数据信道的非分层互连来耦合。 在所有多个地址信道上分别耦合到并且窥探事务的代理只能在相关联的地址信道上发布事务。 当前的非分层互连架构采用的单向信道允许高频抽运操作对于传统的双向共享系统总线是不可能的。 另外,与传统分层系统相比,由于没有层间(例如,总线采集)通信延迟,与本地高速缓存未命中所产生的远程(高速缓存或主)存储器的访问延迟大大降低。 非分层互连架构还允许设计灵活性,因为根据成本和性能考虑,每个节点内的互连部分可以由一组总线或开关单独地实现。

    Multiprocessor system bus protocol with group addresses, responses, and priorities
    10.
    发明授权
    Multiprocessor system bus protocol with group addresses, responses, and priorities 有权
    具有组地址,响应和优先级的多处理器系统总线协议

    公开(公告)号:US06591321B1

    公开(公告)日:2003-07-08

    申请号:US09437200

    申请日:1999-11-09

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A multiprocessor system bus protocol system and method for processing and handling a processor request within a multiprocessor system having a number of bus accessible memory devices that are snooping on. at least one bus line. Snoop response groups which are groups of different types of snoop responses from the bus accessible memory devices are provided. Different transfer types are provided within each of the snoop response groups. A bus master device that provides a bus master signal is designated. The bus master device receives the processor request. One of the snoop response groups and one of the transfer types are appropriately designated based on the processor request. The bus master signal is formulated from a snoop response group, a transfer type, a valid request signal, and a cache line address. The bus master signal is sent to all of the bus accessible memory devices on the cache bus line and to a combined response logic system. All of the bus accessible memory devices on the cache bus line send snoop responses in response to the bus master signal based on the designated snoop response group. The snoop responses are sent to the combined response logic system. A combined response by the combined response logic system is determined based on the appropriate combined response encoding logic determined by the designated and latched snoop response group. The combined response is sent to all of the bus accessible memory devices on the cache bus line.

    摘要翻译: 一种用于处理和处理处理器请求的多处理器系统总线协议系统和方法,所述多处理器系统具有被窥探的多个总线可访问存储器件。 至少有一条总线。 提供了来自总线可访问存储器设备的不同类型的窥探响应的侦听响应组。 在每个窥探响应组中提供不同的传输类型。 指定提供总线主机信号的总线主设备。 总线主设备接收处理器请求。 根据处理器请求适当地指定其中一个侦听响应组和传输类型之一。 总线主机信号由侦听响应组,传输类型,有效请求信号和高速缓存线地址来制定。 总线主机信号被发送到高速缓存总线上的所有总线可访问存储器件和组合响应逻辑系统。 基于指定的窥探响应组,高速缓存总线上的所有总线可访问存储器件响应于总线主机信号发送窥探响应。 侦听响应被发送到组合的响应逻辑系统。 基于由指定和锁存的窥探响应组确定的适当的组合响应编码逻辑来确定组合响应逻辑系统的组合响应。 组合的响应被发送到高速缓存总线上的所有总线可访问存储器件。