摘要:
A multiprocessing system including multiple processing nodes employs various implementations of hierarchical back-off locks. A thread attempting to obtain a software lock may determine whether the lock is currently owned by a different node than the node in which the thread is executing. If the lock is not owned by a different node, the thread executes code to perform a fast spin operation. On the other hand, if the lock is owned by a different node, the thread executes code to perform a slow spin operation. In this manner, node locality may result wherein a thread that is executing within the same node in which a lock has already been obtained will be more likely to subsequently acquire the lock when it is freed in relation to other contending threads executing in other nodes.
摘要:
In one embodiment, a processor comprises a coherence trap unit and a trap logic coupled to the coherence trap unit. The coherence trap unit is also coupled to receive data accessed in response to the processor executing a memory operation. The coherence trap unit is configured to detect that the data matches a designated value indicating that a coherence trap is to be initiated to coherently perform the memory operation. The trap logic is configured to trap to a designated software routine responsive to the coherence trap unit detecting the designated value. In some embodiments, a cache tag in a cache may track whether or not the corresponding cache line has the designated value, and the cache tag may be used to trigger a trap in response to an access to the corresponding cache line.
摘要:
The disclosed embodiments provide a system that uses unused bits in a memory pointer. During operation, the system determines a set of address bits in a address space that will not be needed for addressing purposes during program operation. Subsequently, the system stores data associated with the memory pointer in this set of address bits. The system masks this set of address bits when using the memory pointer to access the memory address associated with the memory pointer. Storing additional data in unused pointer bits can reduce the number of memory accesses for a program and improve program performance and/or reliability.
摘要:
A system and method for reducing shared memory write overhead in multiprocessor system. In one embodiment, a multiprocessing system implements a method comprising storing an indication of obtained store permission corresponding to a particular address in a store buffer. The indication may be, for example, the address of a cache line for which a write permission has been obtained. Obtaining the write permission may include locking and modifying an MTAG or other coherence state entry. The method further comprises determining whether the indication of obtained store permission corresponds to an address of a write operation to be performed. In response to the indication corresponding to the address of the write operation to be performed, the write operation is performed without invoking corresponding global coherence operations.
摘要:
Systems and methods for efficient memory corruption detection in a processor. A processor detects a first data structure is to be allocated in a physical memory. The physical memory may be a DRAM with a spare bank of memory reserved for a hardware failover mechanism. Either the processor or an operating system (OS) determines a first version number corresponding to the first data structure. During initialization of the first data structure, the first version number may be stored in a first location in the spare bank. The processor receives from the OS a pointer holding the first version number. When the processor executes memory access operations targeting the first data structure, the processor compares the first version number with a third version number stored in a location in the physical memory indicated by the memory access address. The processor may set a trap in response to determining a mismatch.
摘要:
A method and processor supporting architected instructions for tracking and determining set membership, such as by implementing Bloom filters. The apparatus includes storage arrays (e.g., registers) and an execution core configured to store an indication that a given value is a member of a set, including by executing an architected instruction having an operand specifying the given value, wherein executing comprises hashing applying a hash function to the value to determine an index into one of the storage arrays and setting a bit of the storage array corresponding to the index. An architected query instruction is later executed to determine if a query value is not a member of the set, including by applying the hash function to the query value to determine an index into the storage array and determining whether a bit at the index of the storage array is set.
摘要:
Systems and methods for maximizing a number of available states for a version number used for memory corruption detection. A physical memory may be a DRAM comprising a plurality of regions. Version numbers associated with data structures allocated in the physical memory may be generated so that version numbers of adjacent data structures in a virtual address space are different. A reserved set and an available set of version numbers are associated with each one of the plurality of regions. A version number in a reserved set of a given region may be in an available set of another region. The processor detects no memory corruption error in response to at least determining a version number stored in a memory location in a first region identified by a memory access operation is also in a reserved set associated with the first region.
摘要:
Systems and methods for efficient memory corruption detection in a processor. A processor detects a first data structure is to be allocated in a physical memory. The physical memory may be a DRAM with a spare bank of memory reserved for a hardware failover mechanism. Either the processor or an operating system (OS) determines a first version number corresponding to the first data structure. During initialization of the first data structure, the first version number may be stored in a first location in the spare bank. The processor receives from the OS a pointer holding the first version number. When the processor executes memory access operations targeting the first data structure, the processor compares the first version number with a third version number stored in a location in the physical memory indicated by the memory access address. The processor may set a trap in response to determining a mismatch.
摘要:
In one embodiment, a processor comprises a coherence trap unit and a trap logic coupled to the coherence trap unit. The coherence trap unit is also coupled to receive data accessed in response to the processor executing a memory operation. The coherence trap unit is configured to detect that the data matches a designated value indicating that a coherence trap is to be initiated to coherently perform the memory operation. The trap logic is configured to trap to a designated software routine responsive to the coherence trap unit detecting the designated value. In some embodiments, a cache tag in a cache may track whether or not the corresponding cache line has the designated value, and the cache tag may be used to trigger a trap in response to an access to the corresponding cache line.
摘要:
Methods and apparatuses are disclosed for improving speculation success in processors. In some embodiments, the method may include executing a plurality of threads of program code, the plurality of threads comprising a first speculative load request, setting an indicator bit corresponding to a cache line in response to the first speculative load request, and in the event that a second speculative load request from the plurality of threads refers to a first cache line with the indicator bit set, determining if a second cache line is available.