Abstract:
A processor may include an address monitor table and an atomic update table to support speculative threading. The processor may also include one or more registers to maintain state associated with execution of speculative threads. The processor may support one or more of the following primitives: an instruction to write to a register of the state, an instruction to trigger the committing of buffered memory updates, an instruction to read the a status register of the state, and/or an instruction to clear one of the state bits associated with trap/exception/interrupt handling. Other embodiments are also described and claimed.
Abstract:
A technique for thread synchronization and communication. More particularly, embodiments of the invention pertain to managing communication and synchronization among two or more threads of instructions being executing by one or more microprocessors or microprocessor cores.
Abstract:
A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.
Abstract:
A technique for operating a computing apparatus includes allocating a working register file entry corresponding to a register in a working register file when an instruction referencing the register proceeds through a particular stage of the computing apparatus. The technique maintains the working register file entry until at least a predetermined number of subsequent instructions have similarly proceeded through the particular stage.
Abstract:
Mechanisms have been developed for providing great flexibility in processor instruction handling, sequencing and execution. In particular, it has been discovered that a programmable pre-decode mechanism can be employed to alter the behavior of a processor. For example, pre-decode hints for sequencing, synchronization or speculation control may altered or mappings of ISA instructions to native instructions or operation sequences may be altered. Such techniques may be employed to adapt a processor implementation (in the field) to varying memory models, implementations or interfaces or to varying memory latencies or timing characteristics. Similarly, such techniques may be employed to adapt a processor implementation to correspond to an extended/adapted instruction set architecture. In some realizations, instruction pre-decode functionality may be adapted at processor run-time to handle or mitigate a timing, concurrency or speculation issue. In some realizations, operation of pre-decode may be reprogrammed post-manufacture, at (or about) initialization, or at run-time.
Abstract:
One embodiment of the present invention provides a system that facilitates storing results of resolvable branches during speculative execution, and then using the results to predict the same branches during non-speculative execution. During operation, the system executes code within a processor. Upon encountering a stall condition, the system speculatively executes the code from the point of the stall, without committing results of the speculative execution to the architectural state of the processor. Upon encountering a branch instruction that is resolved during speculative execution, the system stores the result of the resolved branch in a branch queue, so that the result can be subsequently used to predict the branch during non-speculative execution.
Abstract:
For each memory location in a set of memory locations associated with a thread, setting an indication associated with the memory location to request a signal if data from the memory location is evicted from a cache; and in response to the signal, reloading the set of memory locations into the cache.
Abstract:
Various usage models are provided to utilize a Monitor and Call (“mcall”) instruction that incorporates user-level asynchronous signaling. The various usage models utilize the mcall instruction in a multithreading system in order to enhance concurrent thread execution. Other embodiments are also described and claimed.
Abstract:
One embodiment of the present invention provides a system that facilitates avoiding locks by speculatively executing critical sections of code. During operation, the system allows a process to speculatively execute a critical section of code within a program without first acquiring a lock associated with the critical section. If the process subsequently completes the critical section without encountering an interfering data access from another process, the system commits changes made during the speculative execution, and resumes normal non-speculative execution of the program past the critical section. Otherwise, if an interfering data access from another process is encountered during execution of the critical section, the system discards changes made during the speculative execution, and attempts to re-execute the critical section.
Abstract:
A computer-based method configures a hardware circuit to transfer a message to a message queue in an operating system. The hardware circuit is used to transfer a message to the message queue in the operating system without requiring use of either the operating system or a hypervisor associated with the operating system. The configuring includes tieing (i) a value in a head pointer register with a head pointer for the message queue stored in the hardware circuit and (ii) a value in a tail pointer register for the message queue with a tail pointer for the message queue stored in the hardware circuit so that the value in the head pointer register equals the stored head pointer and the value in the tail pointer register equals the stored tail pointer. The using the hardware circuit uses a logical identifier associated with the message to select an entry in a mapping table of the hardware circuit. A value in the entry in the mapping table is used to select an entry in an action table. The entry in the action table is used to determine a tail pointer for the message queue. The hardware circuit appends the message to a location indicted by the tail pointer without requiring cycles of a hypervisor associated with the strand.