摘要:
An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
摘要:
Systems, apparatusses, and methods are disclosed for improving performance of a stride-based prefetcher on an out-of-order central processing unit (CPU). The present disclosure teaches a processor system that employs out-of-order stride prefetch units. The out-of-order stride prefetch units are utilized for issuing prefetches for out-of-order stride access patterns. In one or more embodiments, the out-of-order stride prefetch units examine the offsets between past virtual address (VA) accesses and the directions of the past VA accesses in order to generate an estimate of the underlying VA access stride of the executed program code (PC). In at least one embodiment, the out-of-order stride prefetch units use the estimate of the VA access stride in order to generate a prediction of future VA accesses. In some embodiments, after the out-of-order stride prefetch units have generated the prediction of future VA accesses, the out-of-order stride prefetch units prefetch the predicted future VA accesses.
摘要:
A network-on-chip system, method, and computer program product are provided for transmitting messages utilizing a centralized on-chip shared memory switch. In operation, a message is sent from one of a plurality of agents connected on a messaging network. The message is received at a central shared memory switch, the central shared memory switch being in communication with each of the plurality of agents. Further, the message is transmitted from the central shared memory switch to a destination agent, the destination agent being one of the plurality of agents.
摘要:
In operation, a first request for data is sent to a cache of a first node. Additionally, it is determined whether the first request can be satisfied within the first node, where the determining includes at least one of determining a type of the first request and determining a state of the data in the cache. Furthermore, a second request for the data is conditionally sent to a second node, based on the determination.
摘要:
A system, method, and computer program product are provided for optimal packet flow in a multi-processor system on a chip. In operation, a credit is allocated for each of a plurality of agents coupled to a messaging network, the allocating including reserving one or more entries in a receive queue of at least one of the plurality of agents. Additionally, a first credit is decremented in response to a first agent sending a message to a second agent, the plurality of agents including the first and second agents. Furthermore, one of the first credit or a second credit is incremented in response to a signal from the second agent.
摘要:
An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
摘要:
An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
摘要:
An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
摘要:
An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
摘要:
An apparatus and method are provided for allocating a plurality of packets to different processor threads. In operation, a plurality of packets are parsed to gather packet information. Additionally, a parse operation is performed utilizing the packet information to generate a key, and a hash algorithm is performed on this key to produce a hash. Further, the packets are allocated to different processor threads, utilizing the hash or the key.