摘要:
Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
摘要:
Methods and apparatus relating to directory cache allocation that is based on snoop response information are described. In one embodiment, an entry in a directory cache may be allocated for an address in response to a determination that another caching agent has a copy of the data corresponding to the address. Other embodiments are also disclosed.
摘要:
A system and method avoids deadlock, such as circular routing deadlock, in a computer system by providing a virtual buffer at main memory. The computer system has an interconnection network that couples a plurality of processors having access to main memory. The interconnection network includes one or more routing agents each having at least one buffer for storing packets that are to be forwarded. When the routing agent's buffer becomes full, thereby preventing it from accepting any additional packets, the routing agent transfers at least one packet into the virtual buffer. By transferring a packet out of the buffer, the routing agent frees up space allowing it to accept a new packet. If the newly accepted packet also results in the buffer becoming full, another packet is transferred into the virtual buffer. This process is repeated until the deadlock condition is resolved. Packets are then retrieved from the virtual buffer.
摘要:
A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.
摘要:
One disclosed embodiment may comprise a system that includes a home node that provides a transaction reference to a requester in response to a request from the requester. The requester provides an acknowledgement message to the home node in response to the transaction reference, the transaction reference enabling the requester to determine an order of requests at the home node relative to the request from the requester.
摘要:
A system comprises a first node including data having an associated D-state and a second node operative to provide a source broadcast requesting the data. The first node is operative in response to the source broadcast to provide the data to the second node and transition the state associated with the data at the first node from the D-state to an O-state without concurrently updating memory. An S-state is associated with the data at the second node.
摘要:
A channel-based mechanism resolves race conditions in a computer system between a first processor writing modified data back to memory and a second processor trying to obtain a copy of the modified data. In addition to a Q0 channel for carrying requests for data, a Q1 channel for carrying probes in response to Q0 requests, and a Q2 channel for carrying responses to Q0 requests, a new channel, the QWB channel, which has a higher priority than Q1 but lower than Q2, is also defined. When a forwarded Read command from the second processor results in a miss at the first processor's cache, because the requested memory block was written back to memory, a Loop command is issued to memory by the first processor on the QWB virtual channel. In response to the Loop command, memory sends the written back version of the memory block to the second processor.
摘要:
A processor includes a cache-side address monitor unit corresponding to a first cache portion of a distributed cache that has a total number of cache-side address monitor storage locations less than a total number of logical processors of the processor. Each cache-side address monitor storage location is to store an address to be monitored. A core-side address monitor unit corresponds to a first core and has a same number of core-side address monitor storage locations as a number of logical processors of the first core. Each core-side address monitor storage location is to store an address, and a monitor state for a different corresponding logical processor of the first core. A cache-side address monitor storage overflow unit corresponds to the first cache portion, and is to enforce an address monitor storage overflow policy when no unused cache-side address monitor storage location is available to store an address to be monitored.
摘要:
A technique to enable secure application and data integrity within a computer system. In one embodiment, one or more secure enclaves are established in which an application and data may be stored and executed.
摘要:
A technique to enable secure application and data integrity within a computer system. In one embodiment, one or more secure enclaves are established in which an application and data may be stored and executed.