Abstract:
In one embodiment, a method includes receiving a read request from a first caching agent, determining whether a directory entry associated with the memory location indicates that the information is not present in a remote caching agent, and if so, transmitting the information from the memory location to the first caching agent before snoop processing with respect to the read request is completed. Other embodiments are described and claimed.
Abstract:
Systems and techniques for virtual device sharing. A failure of one of a plurality of memory devices corresponding to a first rank in a memory system is detected. The memory system has a plurality of ranks, each rank having a plurality of memory devices used to store a cache line. A portion of the cache line corresponding to the failed memory device is stored in a memory device in a second rank in the memory system and the remaining portion of the cache line in the first rank of the memory system.
Abstract:
Methods and apparatus relating to dynamically routing data responses directly to a requesting processor core are described. In one embodiment, data returned in response to a data request is to be directly transmitted to a requesting agent based on information stored in a route back table. Other embodiments are also disclosed.
Abstract:
In one embodiment, a method includes receiving a read request from a first caching agent, determining whether a directory entry associated with the memory location indicates that the information is not present in a remote caching agent, and if so, transmitting the information from the memory location to the first caching agent before snoop processing with respect to the read request is completed. Other embodiments are described and claimed.
Abstract:
The apparatus and method described herein are for handling shared memory accesses between multiple processors utilizing lock-free synchronization through transactional-execution. A transaction demarcated in software is speculatively executed. During execution invalidating remote accesses/requests to addresses loaded from and to be written to shared memory are tracked by a transaction buffer. If an invalidating access is encountered, the transaction is re-executed. After a pre-determined number of times re-executing the transaction, the transaction may be re-executed non-speculatively with locks/semaphores.
Abstract:
In one embodiment, the present invention includes a method of assigning a location within a shared variable for each of multiple threads and writing a value to a corresponding location to indicate that the corresponding thread has reached a barrier. In such manner, when all the threads have reached the barrier, synchronization is established. In some embodiments, the shared variable may be stored in a cache accessible by the multiple threads. Other embodiments are described and claimed.
Abstract:
An multi-threading processor is provided. The multi-threading processor includes a first instruction fetch unit to receive a first thread and a second instruction fetch unit to receive a second thread. A multi-thread scheduler coupled to the instruction fetch units and a execution unit. The multi-thread scheduler determines the width of the execution unit and the execution unit executes the threads accordingly.
Abstract:
A simultaneous multithreaded processor that reduces the number of hardware components necessary as well as the complexity of design over current systems is disclosed. As opposed to requiring individual storage elements for saving instruction pointer information for each re-steer logic component within a processor pipeline, the present invention allows for instruction pointer information of an inactive thread to be stored in a single, ‘inactive thread’ storage element until the thread becomes active again.
Abstract:
A method and apparatus for permitting out-of-order execution of predicated instructions is disclosed. In one embodiment, a predicated instruction may be decoded into a related predicated instruction and a move instruction contingent on the complementary value of the predicate of the predicated instruction. The destination register of both the related predicated instruction and the move instruction may be mapped to the same physical register, and only one of the two instructions may update machine state with its results.
Abstract:
Methods and apparatus are disclosed for accessing multiple data cache lines for scatter/gather operations. Embodiment of apparatus may comprise address generation logic to generate an address from an index of a set of indices for each of a set of corresponding mask elements having a first value. Line or bank match ordering logic matches addresses in the same cache line or different banks, and orders an access sequence to permit a group of addresses in multiple cache lines and different banks. Address selection logic directs the group of addresses to corresponding different banks in a cache to access data elements in multiple cache lines corresponding to the group of addresses in a single access cycle. A disassembly/reassembly buffer orders the data elements according to their respective bank/register positions, and a gather/scatter finite state machine changes the values of corresponding mask elements from the first value to a second value.