摘要:
A physically distributed cache memory system includes an interconnection network, first level cache memory slices, and second level cache memory slices. The first level cache memory slices are coupled to the interconnection network to generate tagged ordered store requests. Each tagged ordered store requests has a tag including requester identification and a store sequence token. The second level cache memory slices are coupled to the interconnection network to execute ordered store requests in-order across the physically distributed cache memory system in response to each tag of the tagged ordered store requests.
摘要:
A multi-core processor includes a plurality of processors and a shared cache. Cache control logic implements an inclusive cache scheme among the shared cache and the local caches for the processors. Counters are maintained to track instances, per set, when a processor chooses to delay eviction from the local cache. While the counter indicates that one or more delayed evictions are pending for a set, the cache control logic treats the set as non-inclusive, broadcasting foreign snoops to all of the local caches, regardless of whether the snoop hits in the shared cache. Other embodiments are also described and claimed.
摘要:
Communicating among nodes in a network includes: sending a packet from an origin node to a destination node over a route including plural nodes. At each node in the route, routing of the packet is initiated according to a predicted path concurrently with verifying the correctness of the predicted path based on analyzing route information in the packet. In response to results of verifying the correctness of the predicted path, the routing of the packet is completed according to the predicted path or initiating a routing of the packet according to an actual path based on the route information in the packet.
摘要:
Communicating among nodes in a network includes: sending a packet from an origin node to a destination node over a route including plural nodes. At each node in the route, routing of the packet is initiated according to a predicted path concurrently with verifying the correctness of the predicted path based on analyzing route information in the packet. In response to results of verifying the correctness of the predicted path, the routing of the packet is completed according to the predicted path or initiating a routing of the packet according to an actual path based on the route information in the packet.
摘要:
Communicating among cores in a computing system comprising a plurality of cores, each core comprising a processor and a switch, includes: routing a packet from a core or from a device coupled to at least one core to a destination over a route including one or more cores, with an order of dimensions associated with the route being selected dynamically upon construction of the packet; routing the packet to a first core in the route over the first selected dimension; and routing the packet from the first core to the destination over the second dimension.
摘要:
A multicore processor comprises a plurality of cache memories, and a plurality of processor cores, each associated with one of the cache memories. Each of at least some of the cache memories is configured to maintain at least a portion of the cache memory in which each cache line is dynamically managed as either local to the associated processor core or shared among multiple processor cores.
摘要:
Methods of operating two or more devices in lockstep by generating requests at each device, comparing the requests, and forwarding matching requests to a servicing node are described and claimed. A redundant execution system using the methods is also described and claimed.
摘要:
A method for predicting early write back of owned cache blocks in a shared memory computer system. This invention enables the system to predict which written blocks may be more likely to be requested by another CPU and the owning CPU will write those blocks back to memory as soon as possible after updating the data in the block. If another processor is requesting the data, this can reduce the latency to get that data, reducing synchronization overhead, and increasing the throughput of parallel programs.
摘要:
A multicore processor comprises a plurality of cache memories, and a plurality of processor cores, each associated with one of the cache memories. Each of at least some of the cache memories is configured to maintain at least a portion of the cache memory in which each cache line is dynamically managed as either local to the associated processor core or shared among multiple processor cores.
摘要:
Techniques for interconnects structures for a multi-core processor including at least two multi-core integrated circuits include forming at least two multi-core integrated circuits each on a respective substrate into a stack, disposing connections through the stack between a circuit of a first one of the at least two multi-core integrated circuits and a circuit of a second, different one of the at least two multi-core integrated circuits, the integrated circuits arranged in the stack with connections of the first one connected to a receiving pad of the second one.