Abstract:
A register file, in a processor, includes a first plurality of registers of a first size, n-bits. A decoder uses a mapping that divides the register file into a second plurality M of registers having a second size. Each of the registers having the second size is assigned a different name in a continuous name space. Each register of the second size includes a plurality N of registers of the first size, n-bits. Each register in the plurality N of registers is assigned the same name as the register of the second size that includes that plurality. State information is maintained in the register file for each n-bit register. The dependence of an instruction on other instructions is detected through the continuous name space. The state information allows the processor to determine when the information in any portion, or all, of a register is valid.
Abstract:
A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.
Abstract:
One embodiment of the present invention provides a system that selectively monitors store instructions to support transactional execution of a process, wherein changes made during the transactional execution are not committed to the architectural state of a processor until the transactional execution successfully completes. Upon encountering a store instruction during transactional execution of a block of instructions, the system determines whether the store instruction is a monitored store instruction or an unmonitored store instruction. If the store instruction is a monitored store instruction, the system performs the store operation, and store-marks a cache line associated with the store instruction to facilitate subsequent detection of an interfering data access to the cache line from another process. If the store instruction is an unmonitored store instruction, the system performs the store operation without store-marking the cache line.
Abstract:
The present relates to a method and a system for identifying an application type from encrypted traffic transported over an IP network. The method and system extract at least a portion of IP flow parameters from the encrypted traffic using at least one of specific target encryption types. Then, the method and system transmit the extracted IP flow parameters to a learning-based classification engine. The learning-based classification engine has been trained with unencrypted traffic. Then, the method and system infer at least one corresponding application type for the extracted IP flow parameters.
Abstract:
A user is provided with means to sample memory hierarchy via software. This allows a user to enhance memory-level parallelism via software. A status of information needed for execution of a second computer program instruction is read in response to execution of a first computer program instruction. Execution continues with execution of the second computer program instruction upon the status being a first status. Alternatively, a third computer program instruction is executed upon the status being a second status different from the first status. Thus, execution of the first computer program instruction allows control of the memory hierarchy, which in turn give the user control of the memory hierarchy.
Abstract:
Embodiments of the present invention provide a system that executes a transaction on a simultaneous speculative threading (SST) processor. In these embodiments, the processor includes a primary strand and a subordinate strand. Upon encountering a transaction with the primary strand while executing instructions non-transactionally, the processor checkpoints the primary strand and executes the transaction with the primary strand while continuing to non-transactionally execute deferred instructions with the subordinate strand. When the subordinate strand non-transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate the first strand ID. When the primary strand transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate a second strand ID.
Abstract:
A processor reduces wasted cycle time resulting from stalling and idling, and increases the proportion of execution time, by supporting and implementing both vertical multithreading and horizontal multithreading. Vertical multithreading permits overlapping or “hiding” of cache miss wait times. In vertical multithreading, multiple hardware threads share the same processor pipeline. A hardware thread is typically a process, a lightweight process, a native thread, or the like in an operating system that supports multithreading. Horizontal multithreading increases parallelism within the processor circuit structure, for example within a single integrated circuit die that makes up a single-chip processor. To further increase system parallelism in some processor embodiments, multiple processor cores are formed in a single die. Advances in on-chip multiprocessor horizontal threading are gained as processor core sizes are reduced through technological advancements.
Abstract:
A processor including a large register file utilizes a dirty bit storage coupled to the register file and a dirty bit logic that controls resetting of the dirty bit storage. The dirty bit logic determines whether a register or group of registers in the register file has been written since the process was loaded or the context was last restored and, if written generates a value in the dirty bit storage that designates the written condition of the register or group of registers. When the context is next saved, the dirty bit logic saves a particular register or group of registers when the dirty bit storage indicates that a register or group of registers was written. If the register or group of registers was not written, the context is switched without saving the register or group of registers. The dirty bit storage is initialized when a process is loaded or the context changes.
Abstract:
One embodiment of the present invention provides a system that facilitates deferring execution of instructions with unresolved data dependencies as they are issued for execution in program order. During a normal execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system generates a checkpoint that can subsequently be used to return execution of the program to the point of the instruction. Next, the system executes the instruction and subsequent instructions in an execute-ahead mode, wherein instructions that cannot be executed because of an unresolved data dependency are deferred, and wherein other non-deferred instructions are executed in program order. Upon encountering a store during the execute-ahead mode, the system determines if the store buffer is full. If so, the system prefetches a cache line for the store, and defers execution of the store. If the number of stores that are encountered during execute-ahead mode exceeds the capacity of the store buffer, which means that the store buffer will never have additional space to accept additional stores during the execute-ahead mode because the store buffer is gated, the system directly enters the scout mode, without waiting for the deferred queue to eventually fill.
Abstract:
One embodiment of the present invention provides a system that facilitates a fast execution restart following speculative execution. During normal operation of the system, a processor executes code on a non-speculative mode. Upon encountering a stall condition, the system checkpoints the state of the processor and executes the code in a speculative mode from the point of the stall. As the processor commences execution in speculative mode, it stores copies of instructions as they are issued into a recovery queue. When the stall condition is ultimately resolved, execution in non-speculative mode is recommenced and the execution units are initially loaded with instructions from the recovery queue, thereby avoiding the delay involved in waiting for instructions to propagate through the fetch and the decode stages of the pipeline. At the same time, the processor begins fetching subsequent instructions following the last instruction in the recovery queue. When all the instructions have been loaded from the recovery queue, the execution units begin receiving the subsequent instructions that have propagated through the fetch and decode stages of the pipeline.