摘要:
Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.
摘要:
Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.
摘要:
In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.
摘要:
In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.
摘要:
In one embodiment, a processor comprises a plurality of processor cores and an interconnect to which the plurality of processor cores are coupled. Each of the plurality of processor cores comprises at least one translation lookaside buffer (TLB). A first processor core is configured to broadcast a demap command on the interconnect responsive to executing a demap operation. The demap command identifies one or more translations to be invalidated in the TLBs, and remaining processor cores are configured to invalidate the translations in the respective TLBs. The remaining processor cores transmit a response to the first processor core, and the first processor core is configured to delay continued processing subsequent to the demap operation until the responses are received from each of the remaining processor cores.
摘要:
In one embodiment, a processor comprising at least one translation lookaside buffer (TLB) and a control unit coupled to the TLB. The control unit is configured to track whether or not at least one update to the TLB is pending for at least one of a plurality of strands. Each strand comprises hardware to support a different thread of a plurality of concurrently activateable threads in the processor. The strands share the TLB, and the control unit is configured to delay a demap operation issued from one of the estrands responsive to the pending update, if any.
摘要:
Techniques and structures are disclosed relating to a branch target cache (BTC) in a processor. In one embodiment, the BTC is usable to predict whether a control transfer instruction is to be taken, and, if applicable, a target address for the instruction. The BTC may operate in conjunction with a delayed branch predictor (DBP) that is more accurate but slower than the BTC. If the BTC indicates that a control transfer instruction is predicted to be taken, the processor begins to fetch instructions at the target address indicated by the BTC, but may discard those instructions if the DBP subsequently determines that the control transfer instruction was predicted incorrectly. Branch prediction information output from the BTC and the DBP may be used to update the branch target cache for subsequent predictions. In various embodiments, the BTC may simultaneously store entries for multiple processor threads, and may be fully associative.
摘要:
Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2′, and to perform a modular addition using a value R2 to produce a value R1'.
摘要:
A multithreaded microprocessor includes an instruction fetch unit that may fetch and maintain a plurality of instructions belonging to one or more threads and one or more execution units that may concurrently execute the one or more threads. The instruction fetch unit includes a target branch prediction unit that may provide a predicted branch target address in response to receiving an instruction fetch address of a current indirect branch instruction. The branch prediction unit includes a primary storage and a control unit. The storage includes a plurality of entries, and each entry may store a predicted branch target address corresponding to a previous indirect branch instruction. The control unit may generate an index value for accessing the storage using a portion of the instruction fetch address of the current indirect branch instruction, and branch direction history information associated with a currently executing thread of the one or more threads.
摘要:
A multiple-core processor providing flexible mapping of processor cores to cache banks. In one embodiment, a processor may include a cache including a number of cache banks. The processor may further include a number of processor cores configured to access the cache banks, as well as core/bank mapping logic coupled to the cache banks and processor cores. The core/bank mapping logic may be configurable to map a cache bank select portion of a memory address specified by a given one of the processor cores to any one of the cache banks.