摘要:
In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.
摘要:
In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.
摘要:
A system and method for profiling runtime system events of a computer system may include associating a data source type with detected system events. The system events may be detected dependent on information included in a reply message received by a processor in response to a data request or other transaction request message. The reply message may include information characterizing a source type of a source of data included in the reply message. The source type information may indicate that the source is remote or local; that it is a shared or a private storage location; that the data is supplied via a cache-to-cache transfer; or that the data is sourced from a coherency domain other than that of the requesting process. Instructions, events, messages, and replies may be sampled, and extended address information corresponding to the samples may be stored in an event set database for performance analysis.
摘要:
In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.
摘要:
In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.
摘要:
A method for repairing a pipeline in response to a branch instruction having a branch, includes the steps of providing a branch repair table having a plurality of entries, allocating an entry in the branch repair table for the branch instruction, storing a target address, a fall-through address, and repair information in the entry in the branch repair table, processing the branch instruction to determine whether the branch was taken, and repairing the pipeline in response to the repair information and the fall-through address in the entry in the branch repair table when the branch was not taken.
摘要:
One embodiment of the present invention provides a method and an apparatus for predicting the target of a branch instruction. This method and apparatus operate by using a translation lookaside buffer (TLB) to store page numbers for predicted branch target addresses. In this embodiment, a branch target address table stores a small index to a location in the translation lookaside buffer, and this index is used retrieve a page number from the location in the translation lookaside buffer. This page number is used as the page number portion of a predicted branch target address. Thus, a small index into a translation lookaside buffer can be stored in a predicted branch target address table instead of a larger page number for the predicted branch target address. This technique effectively reduces the size of a predicted branch target table by eliminating much of the space that is presently wasted storing redundant page numbers. Another embodiment maintains coherence between the branch target address table and the translation lookaside buffer. This makes it possible to detect a miss in the translation lookaside buffer at least one cycle earlier by examining the branch target address table.
摘要:
A cache memory array stores two-way set associative data. An odd set data bank stores odd number sets of the two-way set associative data, where the two ways of each odd number set are aligned horizontally within the odd set data bank. An even set data bank stores even number sets of the two-way set associative data, where the two ways of each even number set are aligned horizontally within the even set data bank. Also, the odd set data bank is aligned horizontally with the even set data bank such that each odd number set is aligned horizontally with a next even number set. The horizontally aligned ways are interleaved for data path width reduction. Set and way selection circuits extract lines of data from the array. The array may be structurally implemented by single-ported RAM cells.
摘要:
In one embodiment, a processor comprises a core configured to execute instructions; a register file comprising a plurality of storage locations; and a window management unit. The window management unit is configured to operate the plurality of storage locations as a plurality of windows, wherein register addresses encoded into the instructions identify storage locations among a subset of the plurality of storage locations that are within a current window. Additionally, the window management unit is configured to allocate a second window in response to a predetermined event. One of the current window and the second window serves as a checkpoint of register state, and the other one of the current window and the second window is updated in response to instructions processed subsequent to the checkpoint. The checkpoint may be restored if the speculative execution results are discarded.
摘要:
A method and apparatus for performing multiple branch predictions per cycle is disclosed. The method and apparatus according to the present invention determine, within one fetch cycle, which instructions in a plurality of fetch instructions are branches and whether such branches are taken or not taken thereby finding the oldest taken branch, which has a target address that is fetched within the same fetch cycle.