摘要:
Systems and methods described herein for performing incremental register checkpointing may employ a special register to indicate which registers have already been checkpointed. This register may include one bit per register. These systems may also include a special pointer register whose value identifies a location in user memory or in dedicated on-chip storage at which a copy of a register's value should be saved by a checkpointing operation. Only registers modified during speculative execution or execution of a transaction may be checkpointed (e.g., when register modifying instructions are encountered) and subsequently restored (e.g., due to misspeculation or transaction abort), rather than all of the registers of the processor. Each register may be checkpointed at most once for a given speculative episode or atomic transaction. Setting a bit in the special register may prevent checkpointing of the corresponding register. Setting all of the bits in the special register may disable checkpointing.
摘要:
Techniques for controlling a thread on a computerized system having multiple processors involve accessing state information of a blocked thread, and maintaining the state information of the blocked thread at current values when the state information indicates that less than a predetermined amount of time has elapsed since the blocked thread ran on the computerized system. Such techniques further involve setting the state information of the blocked thread to identify affinity for a particular processor of the multiple processors when the state information indicates that at least the predetermined amount of time has elapsed since the blocked thread ran on the computerized system. Such operation enables the system to place a cold blocked thread which shares data with another thread on the same processor of that other thread so that, when the blocked thread awakens and runs, that thread is closer to the shared data.
摘要:
In shared-memory computer systems, threads may communicate with one another using shared memory. A receiving thread may poll a message target location repeatedly to detect the delivery of a message. Such polling may cause excessive cache coherency traffic and/or congestion on various system buses and/or other interconnects. A method for inter-processor communication may reduce such bus traffic by reducing the number of reads performed and/or the number of cache coherency messages necessary to pass messages. The method may include a thread reading the value of a message target location once, and determining that this value has been modified by detecting inter-processor messages, such as cache coherence messages, indicative of such modification. In systems that support transactional memory, a thread may use transactional memory primitives to detect the cache coherence messages. This may be done by starting a transaction, reading the target memory location, and spinning until the transaction is aborted.
摘要:
Adaptive modifications of spinning and blocking behavior in spin-then-block mutual exclusion include limiting spinning time to no more than the duration of a context switch. Also, the frequency of spinning versus blocking is limited to a desired amount based on the success rate of recent spin attempts. As an alternative, spinning is bypassed if spinning is unlikely to be successful because the owner is not progressing toward releasing the shared resource, as might occur if the owner is blocked or spinning itself. In another aspect, the duration of spinning is generally limited, but longer spinning is permitted if no other threads are ready to utilize the processor. In another aspect, if the owner of a shared resource is ready to be executed, a thread attempting to acquire ownership performs a “directed yield” of the remainder of its processing quantum to the other thread, and execution of the acquiring thread is suspended.
摘要:
A method, apparatus and computer program product for providing page-protection based memory access barrier traps is presented. A value for a user-mode bit (u-bit) is computed for each extant virtual page in an address space, the u-bit indicative that an object on the virtual page is being moved by a Garbage Collector process. An instruction is executed which causes an access protection fault. The state of the u-bit for the virtual page associated with the access protection fault is consulted when the access protection fault is encountered. Additionally, the access protection fault is translated into a user-trap (utrap) and the utrap is serviced when the u-bit is set.
摘要:
For each of multiple processes executing in parallel, as long as corresponding version information associated with a respective set of one or more shared variables used for computational purposes has not changed during execution of a respective transaction, results of the respective transaction can be globally committed to memory without causing data corruption. If version information associated with one or more respective shared variables (used to produce the transaction results) happens to change during a process of generating respective results, then a respective process can identify that another process modified the one or more respective shared variables during execution and that its transaction results should not be committed to memory. In this latter case, the transaction repeats itself until it is able to commit respective results without causing data corruption.
摘要:
A computer system includes multiple processing threads that execute in parallel. The multiple processing threads have access to a global environment including different types of metadata enabling the processing threads to carry out simultaneous execution depending on a currently selected type of lock mode. A mode controller monitoring the processing threads initiates switching from one type of lock mode to another depending on current operating conditions such as an amount of contention amongst the multiple processing threads to modify the shared data. The mode controller can switch from one lock mode another regardless of whether any of the multiple processes are in the midst of executing a respective transaction. A most efficient lock mode can be selected to carry out the parallel transactions. In certain cases, switching of lock modes causes one or more of the processing threads to abort and retry a respective transaction according to the new mode.
摘要:
Cache logic associated with a respective one of multiple processing threads executing in parallel updates corresponding data fields of a cache to uniquely mark its contents. The marked contents represent a respective read set for a transaction. For example, at an outset of executing a transaction, a respective processing thread chooses a data value to mark contents of the cache used for producing a transaction outcome for the processing thread. Upon each read of shared data from main memory, the cache stores a copy of the data and marks it as being used during execution of the processing thread. If uniquely marked contents of a respective cache line happen to be displaced (e.g., overwritten) during execution of a processing thread, then the transaction is aborted (rather than being committed to main memory) because there is a possibility that another transaction overwrote a shared data value used during the respective transaction.
摘要:
A technique for accessing a shared resource of a computerized system involves running a first portion of a first thread within the computerized system, the first portion (i) requesting a lock on the shared resource and (ii) directing the computerized system to make operations of a second thread visible in a correct order. The technique further involves making operations of the second thread visible in the correct order in response to the first portion of the first thread running within the computerized system, and running a second portion of the first thread within the computerized system to determine whether the first thread has obtained the lock on the shared resource. Such a technique alleviates the need for using a MEMBAR instruction in the second thread.
摘要:
An operating system call control subsystem is disclosed for use in a computer that includes a processor for processing a program, the program instructions of an operating system call instruction type identifying one of a plurality of types of operating system calls, each type of operating system call being associable with an operating system call type identifier value within a predetermined range of values. The operating system call control subsystem comprises a crossover table, an operating system call instruction type address resolution module, and an operating system call instruction type processing module. The crossover table has a number of entries corresponding to a predetermined fraction of the predetermined range, each entry in the crossover table having an instruction for enabling the processor to save a value corresponding to an offset of the entry into the crossover table. The operating system call instruction type address resolution module provides the instructions of the operating system call instruction type with respective target addresses that include an operating system call set identifier in a set of operating system call set identifiers, the number of operating system call set identifiers multiplied by the number of crossover table entries corresponding to the predetermined range and an offset value corresponding to an offset to an entry into the crossover table. The operating system call instruction type processing module, in response to the processor processing an instruction of the operating system call instruction type, (a) saves the operating system call set identifier from the target address, (b) selects one of the entries in the crossover table using the offset value of the target address, (c) processes the instruction from the selected entry of the crossover table to save the value corresponding to the offset of the selected entry in the crossover table, and (d) generates the operating system call type identifier value in connection with the saved operating system call set identifier and the saved value corresponding to the offset of the selected entry in the crossover table.