摘要:
An apparatus and method for optimizing a non-inclusive hierarchical cache memory system that includes a first and second cache for storing information. The first and second cache are arranged in an hierarchical manner such as a level two and level three cache in a cache system having three levels of cache. The level two and level three cache hold information non-inclusively, while a dual directory holds tags and states that are duplicates of the tags and states held for the level two cache. All snoop requests (snoops) are passed to the dual directory by a snoop queue. The dual directory is used to determine whether a snoop request sent by snoop queue is relevant to the contents of level two cache, avoiding the need to send the snoop request to level two cache if there is a "miss" in the dual directory. This increases the available cache bandwidth that can be made available by second cache since the number of snoops appropriating the cache bandwidth of second cache are reduced by the filtering effect of dual directory. Also, the third cache is limited to holding read-only information and receiving write-invalidation snoop requests. Only snoops relating to write-invalidation requests are passed to a directory holding tags and state information corresponding to the third cache. Limiting snoop requests to write invalidation requests minimizes snoop requests to third cache, increasing the amount of cache memory bandwidth available for servicing catch fetches from third cache. In the event that a cache hit occurs in third cache, the information found in third cache must be transferred to second cache before a modification can be made to that information.
摘要:
A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.
摘要:
A cache system including a data cache memory comprising a plurality of cache lines. A tag store has an entry representing each line in the cache memory where each entry comprises tag information for accessing the data cache. The tag information includes state information indicating whether the represented cache line includes dirty data. A speculative write back unit monitors the state information and is operative to initiate a write back of a cache line having more than a preselected amount of dirty data.
摘要:
Methods, systems and computer program products to maintain cache coherency, in a System On Chip (SOC) which is part of a distributed shared memory system are described. A local SOC unit that includes a local controller and an on-chip memory is provided. In response to receiving a request from a remote controller of a remote SOC to access a memory location, the local controller determines whether the local SOC has exclusive ownership of the requested memory location, sends data from the memory location if the local SOC has exclusive ownership of the memory location and stores an entry in the on-chip memory that identifies the remote SOC as having requested data from the memory location. The entry specifies whether the request from the remote SOC is for exclusive ownership of the memory location. The entry also includes a field that identifies the remote SOC as the requester. The requested memory location may be external or internal to the local SOC unit.
摘要:
A system includes a first bus segment and a second bus segment. The first bus segment is operatively coupled to one or more first bus agents, where the first bus agents are configured for writing messages to the first bus segment and reading messages from the first bus segment and the second bus segment, which is separate from the first bus segment, is operatively coupled to one or more second bus agents. The first bus agents are configured for writing messages to the first bus segment and reading messages from the first bus segment. The system also includes first electrical circuitry operably coupled to the first bus segment and the second bus segment and configured to read messages written on the first bus segment and to write the messages onto the second bus segment and second electrical circuitry operably coupled to the first bus segment and the second bus segment and configured to read messages written on the second bus segment and to write the messages onto the first bus segment.
摘要:
Aspects of a method and system for hash table based routing via prefix transformation are provided. Aspects of the invention may enable translating one or more network addresses as a coefficient set of a polynomial, and routing data in a network based on a quotient and a remainder derived from the coefficient set. In this regard, the quotient and the remainder may be calculated via modulo 2 division of the polynomial by a primitive generator polynomial. In one example, the remainder may be calculated with the aid of a remainder table. The primitive generator polynomial may be x16+x8+x6+x5+x4+x2+1. Additionally, entries in one or more hash tables may comprise a calculated quotient and may be indexed by a calculated remainder. In this manner, the hash tables may be accessed to determine a longest prefix match for the one or more network addresses. The hash tables may comprise 2deg(g(x)) sets, where deg(g(x)) is the degree of the primitive generator polynomial. Accordingly, the hash tables may be set associative and multiple entries may be indexed by the same remainder. Furthermore, entries in the hash tables may comprise a next hop address utilized in routing network traffic.
摘要:
Methods, systems and computer program products for global address space management are described herein. A System on Chip (SOC) unit configured for a global address space is provided. The SOC includes an on-chip memory, a first controller and a second controller. The first controller is enabled to decode addresses that map to memory locations in the on-chip memory and the second controller is enabled to decode addresses that map to memory locations in an off-chip memory.
摘要:
Managing data traffic among three or more bus agents configured in a topological ring includes numbering each bus agent sequentially and injecting messages that include a binary polarity value from the bus agents into the ring in a sequential order according to the numbering of the bus agents during cycles of bus agent activity. Messages from the ring are received into two or more receive buffers of a receiving bus agent, and the value of the binary polarity value is alternated after succeeding cycles of bus ring activity. The received messages are ordered for processing by the receiving bus agent based on the polarity value of the messages and a time at which each message was received.
摘要:
Disclosed herein is an apparatus which may comprise a plurality of nodes. In one example embodiment, each of the plurality of nodes may include one or more central processing units (CPUs), a random access memory device, and a parallel link input/output port. The random access memory device may include a local memory address space and a global memory address space. The local memory address space may be accessible to the one or more CPUs of the node that comprises the random access memory device. The global memory address space may be accessible to CPUs of all the nodes. The parallel link input/output port may be configured to send data frames to, and receive data frames from, the global memory address space comprised by the random access memory device(s) of the other nodes.
摘要:
Certain embodiments of the invention may be found in a method for a high performance hardware network protocol processing engine. The method may comprise processing TCP packets via a plurality of pipelined hardware stages on a single network chip. Headers of received TCP packets may be parsed, and Ethernet frame CRC digests, IP checksums and TCP checksums may be validated, at a first stage of the parallel, pipelined hardware stages. IP addresses of the TCP packets that are received may also be validated at the first stage. TCB index of the TCP packets that are received may be looked up at a second stage. TCB data for TCP packets may be looked up at a third stage and receive processing of the TCP packets may be performed at a fourth stage. A fifth stage may initiate transfer of the processed TCP packets that are received to an application layer.