Computer system with processor cache that stores remote cache presence information
    1.
    发明授权
    Computer system with processor cache that stores remote cache presence information 有权
    具有处理器缓存的计算机系统,用于存储远程缓存存在信息

    公开(公告)号:US07096323B1

    公开(公告)日:2006-08-22

    申请号:US10256970

    申请日:2002-09-27

    IPC分类号: G06F13/14

    摘要: A computer system with a processor cache that stores remote cache presence information. In one embodiment, a plurality of presence vectors are stored to indicate whether particular blocks of data mapped to another node are being remotely cached. Rather than storing the presence vectors in a dedicated storage, the remote cache presence vectors may be stored in designated locations of a cache memory subsystem, such as an L2 cache, associated with a processor core. For example, a designated way of the cache memory subsystem may be allocated for storing remote cache presence vectors, while the remaining ways of the cache are used to store normal processor data. New data blocks may be remotely cached in response to evictions from the cache memory subsystem. In yet a further embodiment, additional entries of the cache memory subsystem may be used for storing directory entries to filter probe command and response traffic.

    摘要翻译: 具有存储远程缓存存在信息的处理器高速缓存的计算机系统。 在一个实施例中,存储多个存在向量以指示映射到另一节点的特定数据块是否被远程高速缓存。 而不是将存在向量存储在专用存储器中,远程高速缓存存在向量可以存储在与处理器核心相关联的高速缓冲存储器子系统(例如L 2高速缓存)的指定位置。 例如,缓存存储器子系统的指定方式可被分配用于存储远程高速缓存存在向量,而高速缓存的剩余方式用于存储正常的处理器数据。 响应于来自高速缓冲存储器子系统的驱逐,可以远程高速缓存新的数据块。 在又一个实施例中,高速缓存存储器子系统的附加条目可以用于存储目录条目以过滤探测命令和响应流量。

    METHOD AND APPARATUS FOR REDUCING PROCESSOR CACHE POLLUTION CAUSED BY AGGRESSIVE PREFETCHING
    2.
    发明申请
    METHOD AND APPARATUS FOR REDUCING PROCESSOR CACHE POLLUTION CAUSED BY AGGRESSIVE PREFETCHING 有权
    减少处理器缓存引起的缓存引起的污染的方法和装置

    公开(公告)号:US20120079205A1

    公开(公告)日:2012-03-29

    申请号:US12891027

    申请日:2010-09-27

    申请人: Patrick Conway

    发明人: Patrick Conway

    IPC分类号: G06F12/12

    摘要: A method and apparatus for controlling a first and second cache is provided. A cache entry is received in the first cache, and the entry is identified as having an untouched status. Thereafter, the status of the cache entry is updated to accessed in response to receiving a request for at least a portion of the cache entry, and the cache entry is subsequently cast out according to a preselected cache line replacement algorithm. The cast out cache entry is stored in the second cache according to the status of the cast out cache entry.

    摘要翻译: 提供了一种用于控制第一和第二高速缓存的方法和装置。 在第一高速缓存中接收高速缓存条目,并且该条目被识别为具有未触动状态。 此后,响应于接收到对高速缓存条目的至少一部分的请求,更新缓存条目的状态以被访问,并且高速缓存条目随后根据预先选择的高速缓存行替换算法被丢弃。 根据丢弃缓存条目的状态,将丢弃的高速缓存条目存储在第二个缓存中。

    Speculative memory prefetch
    3.
    发明授权
    Speculative memory prefetch 有权
    推测内存预取

    公开(公告)号:US07930485B2

    公开(公告)日:2011-04-19

    申请号:US11780283

    申请日:2007-07-19

    IPC分类号: G06F13/00

    摘要: A system and method for pre-fetching data from system memory. A multi-core processor accesses a cache hit predictor concurrently with sending a memory request to a cache subsystem. The predictor has two tables. The first table is indexed by a portion of a memory address and provides a hit prediction based on a first counter value. The second table is indexed by a core number and provides a hit prediction based on a second counter value. If neither table predicts a hit, a pre-fetch request is sent to memory. In response to detecting said hit prediction is incorrect, the pre-fetch is cancelled.

    摘要翻译: 一种用于从系统内存预取数据的系统和方法。 多核处理器同时向缓存子系统发送存储器请求来访问高速缓存命中预测器。 预测器有两个表。 第一个表由存储器地址的一部分索引,并且基于第一计数器值提供命中预测。 第二个表由一个核心编号索引,并提供一个基于第二个计数器值的命中预测。 如果两个表都没有预测命中,则会将预取请求发送到内存。 响应于检测到所述命中预测是不正确的,取消预取。

    SNOOP FILTERING MECHANISM
    4.
    发明申请
    SNOOP FILTERING MECHANISM 有权
    SNOOP过滤机制

    公开(公告)号:US20090327616A1

    公开(公告)日:2009-12-31

    申请号:US12164871

    申请日:2008-06-30

    IPC分类号: G06F12/08

    摘要: A system and method for selectively transmitting probe commands and reducing network traffic. Directory entries are maintained to filter probe command and response traffic for certain coherent transactions. Rather than storing directory entries in a dedicated directory storage, directory entries may be stored in designated locations of a shared cache memory subsystem, such as an L3 cache. Directory entries are stored within the shared cache memory subsystem to provide indications of lines (or blocks) that may be cached in exclusive-modified, owned, shared, shared-one, or invalid coherency states. The absence of a directory entry for a particular line may imply that the line is not cached anywhere in a computing system.

    摘要翻译: 用于选择性地发送探测命令并减少网络流量的系统和方法。 维护目录条目以过滤某些连贯事务的探测命令和响应流量。 目录条目不是将目录条目存储在专用目录存储器中,而是可以存储在共享高速缓冲存储器子系统的指定位置,例如L3高速缓存。 目录条目存储在共享高速缓冲存储器子系统内,以提供可以被排除在修改的,拥有的,共享的,共享的或无效的一致性状态中被缓存的行(或块)的指示。 没有特定行的目录条目可能意味着该行不会在计算系统的任何位置缓存。

    Apparatus and method for balanced spinlock support in NUMA systems
    6.
    发明授权
    Apparatus and method for balanced spinlock support in NUMA systems 有权
    NUMA系统平衡自旋锁支架的装置和方法

    公开(公告)号:US07334102B1

    公开(公告)日:2008-02-19

    申请号:US10434692

    申请日:2003-05-09

    申请人: Patrick Conway

    发明人: Patrick Conway

    IPC分类号: G06F12/00

    CPC分类号: G06F9/526

    摘要: A data processor (300) is adapted for use in a non uniform memory access (NUMA) data processing system (10) having a local memory (320) and a remote memory. The data processor (300) includes a central processing unit (302) and a communication link controller (310). The central processing unit (302) executes a plurality of instructions including an atomic instruction on a lock variable, and generates an access request that includes a lock acquire attribute in response to executing the atomic instruction on the lock variable. The communication link controller (310) is coupled to the central processing unit (302) and has an output adapted to be coupled to the remote memory, and selectively provides the access request with the lock acquire attribute to the remote memory if an address of the access request corresponds to the remote memory.

    摘要翻译: 数据处理器(300)适于在具有本地存储器(320)和远程存储器的非均匀存储器访问(NUMA)数据处理系统(10)中使用。 数据处理器(300)包括中央处理单元(302)和通信链路控制器(310)。 中央处理单元(302)执行包含锁定变量的原子指令的多个指令,并且响应于对锁定变量执行原子指令,生成包括锁定获取属性的访问请求。 通信链路控制器(310)耦合到中央处理单元(302)并且具有适于耦合到远程存储器的输出端,并且如果存储器的地址为 访问请求对应于远程内存。

    Communicating between Partitions in a Statically Partitioned Multiprocessing System
    7.
    发明申请
    Communicating between Partitions in a Statically Partitioned Multiprocessing System 有权
    在静态分区多处理系统中分区间进行通信

    公开(公告)号:US20090037688A1

    公开(公告)日:2009-02-05

    申请号:US11831102

    申请日:2007-07-31

    IPC分类号: G06F12/06

    CPC分类号: G06F15/17

    摘要: In one embodiment, a method comprises assigning a unique node number to each of a first plurality of nodes in a first partition of a system and a second plurality of nodes in a second partition of the system. A first memory address space spans first memory included in the first partition and a second memory address space spans second memory included in the second partition. The first memory address space and the second memory address space are generally logically distinct. The method further comprises programming a first address map in the first partition to map the first memory address space to node numbers, wherein the programming comprises mapping a first memory address range within the first memory address space to a first node number assigned to a first node of the second plurality of nodes in the second partition, whereby the first memory address range is mapped to the second partition.

    摘要翻译: 在一个实施例中,一种方法包括向系统的第一分区中的第一多个节点和系统的第二分区中的第二多个节点中的每一个分配唯一的节点号。 第一存储器地址空间跨越包括在第一分区中的第一存储器,并且第二存储器地址空间跨越包括在第二分区中的第二存储器。 第一存储器地址空间和第二存储器地址空间通常在逻辑上是不同的。 该方法还包括编程第一分区中的第一地址映射以将第一存储器地址空间映射到节点号,其中编程包括将第一存储器地址空间内的第一存储器地址范围映射到分配给第一节点的第一节点号 的第二分区中的第二多个节点,由此第一存储器地址范围被映射到第二分区。

    Method and apparatus for injecting write data into a cache
    8.
    发明授权
    Method and apparatus for injecting write data into a cache 有权
    将写入数据注入高速缓存的方法和装置

    公开(公告)号:US07155572B2

    公开(公告)日:2006-12-26

    申请号:US10353216

    申请日:2003-01-27

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0817 G06F12/0835

    摘要: A data processing system (100, 600) has a memory hierarchy including a cache (124, 624) and a lower-level memory system (170, 650). A data element having a special write with inject attribute is received from a data producer (160, 640), such as an Ethernet controller. The data element is forwarded to the cache (124, 624) without accessing the lower-level memory system (170, 650). Subsequently at least one cache line containing the data element is updated in the cache (124, 624).

    摘要翻译: 数据处理系统(100,600)具有包括高速缓存(124,624)和下级存储器系统(170,650)的存储器层级。 从诸如以太网控制器的数据生成器(160,640)接收具有特殊写入注入属性的数据元素。 数据元素被转发到高速缓存(124,624)而不访问下级存储器系统(170,650)。 随后,在高速缓存(124,624)中更新包含数据元素的至少一个高速缓存行。

    Method and apparatus for reducing overhead in a data processing system with a cache
    9.
    发明授权
    Method and apparatus for reducing overhead in a data processing system with a cache 失效
    用于在具有缓存的数据处理系统中减少开销的方法和装置

    公开(公告)号:US07062610B2

    公开(公告)日:2006-06-13

    申请号:US10261642

    申请日:2002-09-30

    申请人: Patrick Conway

    发明人: Patrick Conway

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0804 G06F12/126

    摘要: A data processor (120) recognizes a special data processing operation in which data will be stored in a cache (124) for one use only. The data processor (120) allocates a memory location to at least one cache line of the cache (124). A data producer such as a data communication driver program running on a central processing unit (122) then writes a data element to the allocated memory location. A data consumer (160) reads the data element by sending a READ ONCE request to a host bridge (130). The host bridge (130) provides the READ ONCE request to a memory controller (126), which reads the data from the cache (124) and de-allocates the at least one cache line without performing a writeback from the cache to a main memory (170). In one form the memory controller (126) de-allocates the at least one cache line by issuing a probe marking the next state of the associated cache line as invalid.

    摘要翻译: 数据处理器(120)识别数据将被存储在高速缓存(124)中以供一次使用的特殊数据处理操作。 数据处理器(120)将存储器位置分配给高速缓存(124)的至少一个高速缓存行。 诸如在中央处理单元(122)上运行的数据通信驱动程序的数据生成器然后将数据元素写入分配的存储器位置。 数据使用者(160)通过向主机桥(130)发送READ ONCE请求来读取数据元素。 主桥(130)向存储器控制器(126)提供READ ONCE请求,存储器控制器(126)从高速缓存(124)读取数据并且去除分配至少一个高速缓存行,而不执行从高速缓存到主存储器的写回 (170)。 在一种形式中,存储器控制器(126)通过发出将相关联的高速缓存行的下一状态标记为无效的探测来去分配至少一条高速缓存行。

    Mechanism to improve performance in a multi-node computer system
    10.
    发明授权
    Mechanism to improve performance in a multi-node computer system 有权
    提高多节点计算机系统性能的机制

    公开(公告)号:US06862634B2

    公开(公告)日:2005-03-01

    申请号:US10150276

    申请日:2002-05-17

    CPC分类号: H04L69/12

    摘要: In a distributed multi-node computer system each switch provides routing of data packets between CPU nodes, I/O nodes, and memory nodes. Each switch is connected through a corresponding I/O node to a network interface controller (NIC) for transferring data packets on a network. Each NIC is memory-mapped. Part of the system address space forms a send window for each NIC connected to a corresponding switch. A mechanism for controlling data packets transmission is defined such that each CPU write to a NIC send window is atomic and self-defining, i.e., it does not rely on immediately preceding write to determine where the data packet should be sent. Using “address aliasing”, CPU writes to the aliased part of the NIC send window are always directed to the NIC connected to the same switch as the CPU which did the write.

    摘要翻译: 在分布式多节点计算机系统中,每个交换机在CPU节点,I / O节点和存储器节点之间提供数据分组的路由。 每个交换机通过相应的I / O节点连接到网络接口控制器(NIC),用于在网络上传输数据包。 每个NIC都是内存映射的。 系统地址空间的一部分形成连接到相应交换机的每个NIC的发送窗口。 定义用于控制数据分组传输的机制,使得写入NIC发送窗口的每个CPU是原子和自定义的,即,它不依赖于紧接在前的写入,以确定应该发送数据分组的位置。 使用“地址别名”,CPU写入NIC发送窗口的别名部分总是被引导到连接到与执行写入的CPU相同的交换机的NIC。