摘要:
A computer system with a processor cache that stores remote cache presence information. In one embodiment, a plurality of presence vectors are stored to indicate whether particular blocks of data mapped to another node are being remotely cached. Rather than storing the presence vectors in a dedicated storage, the remote cache presence vectors may be stored in designated locations of a cache memory subsystem, such as an L2 cache, associated with a processor core. For example, a designated way of the cache memory subsystem may be allocated for storing remote cache presence vectors, while the remaining ways of the cache are used to store normal processor data. New data blocks may be remotely cached in response to evictions from the cache memory subsystem. In yet a further embodiment, additional entries of the cache memory subsystem may be used for storing directory entries to filter probe command and response traffic.
摘要:
A method and apparatus for controlling a first and second cache is provided. A cache entry is received in the first cache, and the entry is identified as having an untouched status. Thereafter, the status of the cache entry is updated to accessed in response to receiving a request for at least a portion of the cache entry, and the cache entry is subsequently cast out according to a preselected cache line replacement algorithm. The cast out cache entry is stored in the second cache according to the status of the cast out cache entry.
摘要:
A system and method for pre-fetching data from system memory. A multi-core processor accesses a cache hit predictor concurrently with sending a memory request to a cache subsystem. The predictor has two tables. The first table is indexed by a portion of a memory address and provides a hit prediction based on a first counter value. The second table is indexed by a core number and provides a hit prediction based on a second counter value. If neither table predicts a hit, a pre-fetch request is sent to memory. In response to detecting said hit prediction is incorrect, the pre-fetch is cancelled.
摘要:
A system and method for selectively transmitting probe commands and reducing network traffic. Directory entries are maintained to filter probe command and response traffic for certain coherent transactions. Rather than storing directory entries in a dedicated directory storage, directory entries may be stored in designated locations of a shared cache memory subsystem, such as an L3 cache. Directory entries are stored within the shared cache memory subsystem to provide indications of lines (or blocks) that may be cached in exclusive-modified, owned, shared, shared-one, or invalid coherency states. The absence of a directory entry for a particular line may imply that the line is not cached anywhere in a computing system.
摘要:
A data processor (300) is adapted for use in a non uniform memory access (NUMA) data processing system (10) having a local memory (320) and a remote memory. The data processor (300) includes a central processing unit (302) and a communication link controller (310). The central processing unit (302) executes a plurality of instructions including an atomic instruction on a lock variable, and generates an access request that includes a lock acquire attribute in response to executing the atomic instruction on the lock variable. The communication link controller (310) is coupled to the central processing unit (302) and has an output adapted to be coupled to the remote memory, and selectively provides the access request with the lock acquire attribute to the remote memory if an address of the access request corresponds to the remote memory.
摘要:
In one embodiment, a method comprises assigning a unique node number to each of a first plurality of nodes in a first partition of a system and a second plurality of nodes in a second partition of the system. A first memory address space spans first memory included in the first partition and a second memory address space spans second memory included in the second partition. The first memory address space and the second memory address space are generally logically distinct. The method further comprises programming a first address map in the first partition to map the first memory address space to node numbers, wherein the programming comprises mapping a first memory address range within the first memory address space to a first node number assigned to a first node of the second plurality of nodes in the second partition, whereby the first memory address range is mapped to the second partition.
摘要:
A data processing system (100, 600) has a memory hierarchy including a cache (124, 624) and a lower-level memory system (170, 650). A data element having a special write with inject attribute is received from a data producer (160, 640), such as an Ethernet controller. The data element is forwarded to the cache (124, 624) without accessing the lower-level memory system (170, 650). Subsequently at least one cache line containing the data element is updated in the cache (124, 624).
摘要:
A data processor (120) recognizes a special data processing operation in which data will be stored in a cache (124) for one use only. The data processor (120) allocates a memory location to at least one cache line of the cache (124). A data producer such as a data communication driver program running on a central processing unit (122) then writes a data element to the allocated memory location. A data consumer (160) reads the data element by sending a READ ONCE request to a host bridge (130). The host bridge (130) provides the READ ONCE request to a memory controller (126), which reads the data from the cache (124) and de-allocates the at least one cache line without performing a writeback from the cache to a main memory (170). In one form the memory controller (126) de-allocates the at least one cache line by issuing a probe marking the next state of the associated cache line as invalid.
摘要:
In a distributed multi-node computer system each switch provides routing of data packets between CPU nodes, I/O nodes, and memory nodes. Each switch is connected through a corresponding I/O node to a network interface controller (NIC) for transferring data packets on a network. Each NIC is memory-mapped. Part of the system address space forms a send window for each NIC connected to a corresponding switch. A mechanism for controlling data packets transmission is defined such that each CPU write to a NIC send window is atomic and self-defining, i.e., it does not rely on immediately preceding write to determine where the data packet should be sent. Using “address aliasing”, CPU writes to the aliased part of the NIC send window are always directed to the NIC connected to the same switch as the CPU which did the write.