摘要:
Method and apparatus for performing table lookup are disclosed. In one embodiment, the method includes providing a lookup table, where the lookup table includes a plurality of translation modes and each translation mode includes a corresponding translation table tree supporting a plurality of page sizes. The method further includes receiving a search request from a requester, determining a translation table tree for conducting the search request, determining a lookup sequence based on the translation table tree, generating a search output using the lookup sequence, and transmitting the search output to the requester. The plurality of translation modes includes a first set of page sizes for 32-bit operating system software and a second set of page sizes for 64-bit operating system software. The plurality of page sizes includes non-global pages, global pages, and both non-global and global pages.
摘要:
A circuit for tracking memory operations with trace-based execution is disclosed. Each trace includes a sequence of operations that includes zero or more of the memory operations. The memory operations being executed form a set of active memory operations that have a predefined program order among them. At least some of the active memory operations access the memory in an execution order that is different from the program order. Checkpoint entries are associated with each trace. Each entry refers to a checkpoint location. Executing one of the active memory operations updates a checkpoint location. During the operation of the circuit, none of the operations of a given trace has any effect on the execution unit's architectural state prior to committing that trace. Each trace becomes eligible for commitment after all operations in the trace complete executing. After the trace is committed, all of the checkpoint entries associated with the trace are invalidated.
摘要:
An instruction processing circuit for a processor is disclosed. The instruction processing circuit is adapted to provide one or more sequence of operations, based on one or more sequence of instructions, to an execution unit of the processor. The instruction processing circuit comprises at least one cache circuit and the processing circuit includes a sequencer and a page translation buffer coupled to the sequencer for trace verification and maintaining coherency between a memory and the at least one cache.
摘要:
This invention includes a circuit for tracking memory operations with trace-based execution. Each trace includes a sequence of operations that includes zero or more of the memory operations. Traces execute atomically and become eligible for commitment after all the operations in the trace have executed. The memory operations being executed form a set of active memory operations that have a predefined program order among them and corresponding ordering constraints. At least some of the active memory operations access the memory in an execution order that is different from the program order. Checkpoint entries are associated with each trace. Each entry refers to a checkpoint location. Memory operation ordering entries correspond to each one of the active memory operations. Rollback requests result in overwriting the checkpoint locations associated with the selected trace as well as the checkpoint locations associated with traces that are younger than the selected trace.
摘要:
Reference architecture instructions are translated into target architecture operations. Sequences of operations, in a predicted execution order in some embodiments, form traces. In some embodiments, a trace is based on a plurality of basic blocks. In some embodiments, a trace is committed or aborted as a single entity. Sequences of operations are optimized by fusing collections of operations; fused operations specify a same observable function as respective collections, but advantageously enable more efficient processing. In some embodiments, a collection comprises multiple register operations. Fusing a register operation with a branch operation in a trace forms a fused reg-op/branch operation. In some embodiments, branch instructions translate into assert operations. Fusing an assert operation with another operation forms a fused assert operation. In some embodiments, fused operations only set architectural state, such as high-order portions of registers, that is subsequently read before being written.
摘要:
Reference architecture instructions are translated into target architecture operations. In some embodiments, an execution unit of a processor executes a function determined from a collection of operations, the function specifying functionality based on instructions, the collection selected from operations translated from the instructions. In further embodiments, the function is specified as a fused operation. Sequences of operations are optimized by fusing collections of operations; fused operations specify a same observable function as respective collections, but advantageously enable more efficient processing. In some embodiments, a collection comprises multiple register operations. Sequences of operations, in a predicted execution order in some embodiments, form traces. In some embodiments, fusing operations requires setting only final architectural state, such as final flag state; intermediate architectural state is used implicitly in a fused operation. In some embodiments, fused operations only set architectural state, such as high-order portions of registers, that is subsequently read before being written.
摘要:
A pipeline control system for implementing a virtual architecture having a complex instruction set is distributed over RISC-like semi-autonomous functional units in a processor. Decoder logic fetches instructions of the target architecture and translates them into simpler RISC-like operations. These operations, each having an associated tag, are issued to the functional units. Address processing unit computes addresses of the instructions and operands, performs segment relocation, and manages the processor's memory. Operations are executed by the units in a manner that is generally independent of operation processing by the other units. The units report termination information back to the decoder logic, but do not irrevocably change the state of the machine. Based on the termination information, the decoder logic retires normally terminated operations in order. Thus, the functional units enable multiple operations to be executed in a speculative and out-of-order manner to fully utilize the resources of the processor.
摘要:
A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local central processing unit (CPU) bus to a conventional processor. The MEU employs vector registers, a vector arithmetic logic unit (ALU), and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.
摘要:
A pipeline control system is distributed over the functional units (15, 17, 20, 25) in a processor (10). Decoder logic (12) issues operations, each with an associated tag, to the functional units, with up to n operations allowed to be outstanding. The units execute the operations and report termination information back to the decoder logic, but do not irrevocably change the state of the machine. Based on the termination information, the decoder logic retires normally terminated operations in order. If an operation terminates abnormally, the decoder logic instructs the units to back out of those operations that include and are later than the operation that terminated abnormally.
摘要:
Various aspects provide for implementing a cache coherence protocol. A system comprises at least one processing component and a centralized controller. The at least one processing component comprises a cache controller. The cache controller is configured to manage a cache memory associated with a processor. The centralized controller is configured to communicate with the cache controller based on a power state of the processor.