摘要:
In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.
摘要:
In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.
摘要:
A method for repairing a pipeline in response to a branch instruction having a branch, includes the steps of providing a branch repair table having a plurality of entries, allocating an entry in the branch repair table for the branch instruction, storing a target address, a fall-through address, and repair information in the entry in the branch repair table, processing the branch instruction to determine whether the branch was taken, and repairing the pipeline in response to the repair information and the fall-through address in the entry in the branch repair table when the branch was not taken.
摘要:
One embodiment of the present invention provides a method and an apparatus for predicting the target of a branch instruction. This method and apparatus operate by using a translation lookaside buffer (TLB) to store page numbers for predicted branch target addresses. In this embodiment, a branch target address table stores a small index to a location in the translation lookaside buffer, and this index is used retrieve a page number from the location in the translation lookaside buffer. This page number is used as the page number portion of a predicted branch target address. Thus, a small index into a translation lookaside buffer can be stored in a predicted branch target address table instead of a larger page number for the predicted branch target address. This technique effectively reduces the size of a predicted branch target table by eliminating much of the space that is presently wasted storing redundant page numbers. Another embodiment maintains coherence between the branch target address table and the translation lookaside buffer. This makes it possible to detect a miss in the translation lookaside buffer at least one cycle earlier by examining the branch target address table.
摘要:
A microprocessor is provided with an instruction fetch mechanism that simultaneously predicts multiple control-flow instructions. The instruction fetch unit farther is capable of handling multiple types of control-flow instructions. The instruction fetch unit uses predecode data and branch prediction data to select the next instruction fetch bundle address. If a branch misprediction is detected, a corrected branch target address is selected as the next fetch bundle address. If no branch misprediction occurs and the current fetch bundle includes a taken control-flow instruction, then the next fetch bundle address is selected based on the type of control-flow instruction detected. If the first taken control-flow instruction is a return instruction, a return address from the return address stack is selected as the next fetch bundle address. If the first taken control-flow instruction is an unconditional branch or predicted taken conditional branch, a predicted branch target address is selected as the next fetch bundle address. If no branch misprediction is detected and the current fetch bundle does not include a taking control-flow instruction, then a sequential address is selected as the next fetch bundle address.
摘要:
A cache memory array stores two-way set associative data. An odd set data bank stores odd number sets of the two-way set associative data, where the two ways of each odd number set are aligned horizontally within the odd set data bank. An even set data bank stores even number sets of the two-way set associative data, where the two ways of each even number set are aligned horizontally within the even set data bank. Also, the odd set data bank is aligned horizontally with the even set data bank such that each odd number set is aligned horizontally with a next even number set. The horizontally aligned ways are interleaved for data path width reduction. Set and way selection circuits extract lines of data from the array. The array may be structurally implemented by single-ported RAM cells.
摘要:
A method of sampling instructions executing in a multi-threaded processor which includes selecting an instruction for sampling, storing information relating to the instruction, determining whether the instruction includes an event of interest, and reporting the instruction if the instruction includes an event of interest on a per-thread basis. The event of interest includes information relating to a thread to which the instruction is bound.
摘要:
In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.
摘要:
One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.
摘要:
A method of handling an exception in a processor includes setting a state upon detection of an exception, signaling a trap for the exception if the state is set, and based on a class of the exception, processing the exception differently before signaling the trap. The method may include replaying an instruction causing the exception before signaling the trap for the exception based on the class of the exception. The method may include replaying the instruction causing the exception after the instruction causing the exception becomes an oldest, unretired instruction. The method may include signaling the trap for the exception after an instruction causing the exception becomes an oldest, unretired instruction. The method may include marking an instruction causing the exception as complete without issuing the instruction causing the exception. An apparatus for handling exceptions in a processor includes an instruction scheduler for setting a state upon detection of an exception and signaling a trap for the exception if the state is set. The instruction scheduler, based on a class of the exception, processes the exception differently before signaling the trap.