摘要:
Contention handling apparatus which receives access request signals from a number of users and processes these requests to allow controlled access to a shared resource. The contention handling apparatus includes a number of access blocks, with one of the access blocks being associated with each user. A busy line of each of the access blocks is connected to receive a busy signal; the busy signal being an access request signal from a higher priority user, thereby indicating that the shared resource is unavailable. Each access block receiving a busy signal, latches the corresponding access request signal until the busy signal is deasserted. If the busy signal and the access request signal occur at the same time, the corresponding access block generates a wait output signal. The logical sum of the wait output of an access block associated with a next higher priority user and the access request signals of all the higher priority users serves as the busy signal for one of the access blocks.
摘要:
A method and apparatus is disclosed for reducing the propagation delay associated with the critical speed path of a binary logic circuit by using "multiplexing logic". More specifically, the inputs to the logic circuit are defined as either critical or non-critical inputs and the product terms are manipulated so that the non-critical inputs are mutually exclusive. The non-critical inputs are supplied to one or more first logic gate structures wherein the ultimate outputs of the first logic gate structures control multiplexer couplers. The critical speed inputs are supplied to one or more second logic gate structures wherein the ultimate outputs of the second logic gate structures are provided as the input to the multiplexer couplers.
摘要:
A technique for operating a processor includes translating, using an associated translation lookaside buffer, a first virtual address into a first physical address through a first entry number in the translation lookaside buffer. The technique also includes translating, using the translation lookaside buffer, a second virtual address into a second physical address through a second entry number in the translation lookaside buffer. The technique further includes, in response to the first entry number being the same as the second entry number, determining that the first and second virtual addresses point to the same physical address in memory and reference the same data.
摘要:
In some embodiments, a data processing system includes a processing unit, a first load/store unit LSU and a second LSU configured to operate independently of the first LSU in single and multi-thread modes. A first store buffer is coupled to the first and second LSUs, and a second store buffer is coupled to the first and second LSUs. The first store buffer is used to execute a first thread in multi-thread mode. The second store buffer is used to execute a second thread in multi-thread mode. The first and second store buffers are used when executing a single thread in single thread mode.
摘要:
A processor reduces the likelihood of stalls at an instruction pipeline by dynamically extending the size of a full execution queue. To extend the full execution queue, the processor temporarily repurposes another execution queue to store instructions on behalf of the full execution queue. The execution queue to be repurposed can be selected based on a number of factors, including the type of instructions it is generally designated to store, whether it is empty of other instruction types, and the rate of cache hits at the processor. By selecting the repurposed queue based on dynamic factors such as the cache hit rate, the likelihood of stalls at the dispatch stage is reduced for different types of program flows, improving overall efficiency of the processor.
摘要:
A system and method for power efficient memory caching. Some illustrative embodiments may include a system comprising: a hash address generator coupled to an address bus (the hash address generator converts a bus address present on the address bus into a current hashed address); a cache memory coupled to the address bus (the cache memory comprises a tag stored in one of a plurality of tag cache ways and data stored in one of a plurality of data cache ways); and a hash memory coupled to the address bus (the hash memory comprises a saved hashed address, the saved hashed address associated with the data and the tag). Less than all of the plurality of tag cache ways are enabled when the current hashed address matches the saved hashed addresses. An enabled tag cache way comprises the tag.
摘要:
An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.
摘要:
A computer system including a microprocessor employing a reorder buffer is provided which stores a last in buffer (LIB) indication corresponding to each instruction. The last in buffer indication indicates whether or not the corresponding instruction is last, in program order, of the instructions within the buffer to update the storage location defined as the destination of that instruction. The LIB indication is included in the dependency checking comparisons. A dependency is indicated for a given source operand and a destination operand within the reorder buffer if the operand specifiers match and the corresponding LIB indication indicates that the instruction corresponding to the destination operand is last to update the corresponding storage location. At most one of the dependency comparisons for a given source operand can indicate dependency. According to one embodiment, the reorder buffer employs a line-oriented configuration. Concurrently decoded instructions are stored into a line of storage, and the concurrently decoded instructions are retired as a unit. A last in line (LIL) indication is stored for each instruction in the line. The LIL indication indicates whether or not the instruction is last within the line storing that instruction to update the storage location defined as the destination of that instruction. The LIL indications for a line can be used as write enables for the register file.
摘要:
A set-associative cache memory configured to use multiple portions of a requested address in parallel to quickly access data from a data array based upon stored way predictions. The cache memory comprises a plurality of memory locations, a plurality of storage locations configured to store way predictions, a decoder, a plurality of pass transistors, and a sense amp unit. A subset of the storage locations are selected according to a first portion of a requested address. The decoder is configured to receive and decode a second portion of the requested address. The decoded portion of the address is used to select a particular subset of the data array based upon the way predictions stored within the selected subset of storage locations. The pass transistors are configured select a second subset of the data array according to a third portion of the requested address. The sense amp unit then reads a cache line from the intersection of the first subset and second subset within the data array.
摘要:
An apparatus for prediction of loop instructions. Loop instructions decrement the value in a counter register and branch to a target address (specified by an instruction operand) if the decremented value of the counter register is greater than zero. The apparatus comprises a loop detection unit that detects the presence of a loop instruction in the instruction stream. An indication of the loop instruction is conveyed to a reorder buffer which stores speculative register values. If the apparatus is not currently processing the loop instruction, a compare value corresponding to the counter register prior to execution of the loop instruction is conveyed to a loop prediction unit. The loop prediction unit also increments a counter value upon receiving each indication of the loop instruction. This counter value is then compared to the compare value conveyed from the reorder buffer. If the counter value is one less than the compare value, a signal is asserted that indicates that the loop instruction should be predicted not-taken upon a next iteration of the loop. In this manner, loop prediction accuracy may be increased by correctly predicting the loop instruction not-taken. Because loops are commonly found in a variety of applications, increasing the accuracy of loop prediction, even slightly, may have a beneficial effect on performance. The loop operation is particularly important in scientific applications where it may be used to perform various digital signal processing routines and to traverse arrays.