摘要:
A translation lookaside buffer may include control functionality coupled to a first storage and a second storage. The first storage includes a first plurality of entries for storing address translations corresponding to a plurality of page sizes. The second storage includes a second plurality of entries for storing address translations corresponding to the plurality of page sizes. In response to receiving a first address translation associated with a first page size, the control functionality may allocate the first plurality of entries to store address translations corresponding to the first page size. In addition, in response to receiving a request including an address that matches an address translation stored within the first storage, the control functionality may copy a matching address translation from the first storage to the second storage.
摘要:
A system and method for branch prediction in a microprocessor. A hybrid device stores branch prediction information in a sparse cache for no more than a common smaller number of branches within each entry of the instruction cache. For the less common case wherein an i-cache line comprises additional branches, the device stores the corresponding branch prediction information in a dense cache. Each entry of the sparse cache stores a bit vector indicating whether or not a corresponding instruction cache line includes additional branch instructions. This indication may also be used to select an entry in the dense cache for storage. A second sparse cache stores entire evicted entries from the first sparse cache.
摘要:
A processor includes a trace cache memory coupled to a trace generator. The trace generator may be configured to generate a plurality of traces each including one or more operations that may be decoded from one or more instructions. Each of the operations may be associated with a respective address. The trace cache memory is coupled to the trace generator and includes a plurality of entries each configured to store one of the traces. The trace generator may be further configured to restrict each of the traces to include only operations having respective addresses that fall within one or more predetermined ranges of contiguous addresses.
摘要:
An apparatus comprises a buffer comprising a plurality of entries, an insert pointer, a delete pointer, a plurality of first control circuits coupled to the buffer, and a second control circuit coupled to the buffer. The entries are logically divided into a plurality of groups. Each of the first control circuits corresponds to a respective group and selects an entry from the respective group for potential reading from the buffer. Furthermore, each of the first control circuits, in the event that the delete pointer indicates a first entry in the respective group and the insert pointer wraps around the buffer and indicates a second entry in the respective group, selects the first entry if the first entry is eligible for selection. The second control circuit selects a first group, and the entry selected from the first group by the first control circuits is the entry read from the buffer.
摘要:
A predecode unit is configured to predecode a fixed number of instruction bytes of variable length instructions per clock cycle. The predecode unit outputs predecode bits which identify the start byte of an instruction. An instruction alignment unit uses the start bits to dispatch the instructions to a plurality of decode units that form fixed issue positions. In one embodiment, the predecode unit identifies a plurality of length vectors. Each length vector is associated with one of the instruction bytes predecoded in a clock cycle and identifies the length of an instruction if an instruction starts at the instruction byte corresponding to the length vector. A tree circuit determines in which instruction bytes instructions start.
摘要:
A core snoop buffer apparatus is provide which stores addresses of pages from which instructions have been fetched but not yet retired (i.e. the instructions are outstanding within the instruction processing pipeline). Addresses corresponding to memory locations being modified are compared to the addresses stored in the core snoop buffer on a page basis. If a match is detected, then instructions are flushed from the instruction processing pipeline and refetched. In this manner, the instructions executed to the point of modifying registers or memory are correct in self-modifying code or multiprocessor environments. Instructions may be speculatively fetched and executed while retaining coherency with respect to changes to memory. The number of pages from which instructions are concurrently outstanding within the microprocessor are typically small compared to the number of cache lines outstanding or the number of instructions outstanding. Therefore, a relatively small hardware structure may be employed to perform the instruction coherency functionality.
摘要:
A pipelined or superscalar processor (10) that executes operations utilizing operand data of variable bit widths improves parallel performance by partitioning a fixed bit width operand (200) into several partial operand fields (215, 216 and 217), and checking for data dependencies, tagging and forwarding data in these fields independently of one another. An instruction decoder (18) concurrently dispatches multiple ROPs to various functional units (20, 21, 22 and 80). Conflicts which arise with respect to register resources are resolved through register renaming. However, implementation of register renaming is difficult when register structures are overlapping. The present invention supports independent dependency checking, tagging and forwarding of partial bit fields of a register operand which, in combination, allow renaming of registers. Therefore, the variable width register operand structure greatly assists the processor to resolve data dependencies. Operands are tagged by a reorder buffer (26) and supplied with data when it becomes available without regard for the type of data. This method of dependency resolution supports parallel performance of operations and provides a substantial improvement in overall speed of processing. Thus, the processor promotes parallel processing of operations that act upon overlapping data structures which otherwise resist parallel handling.
摘要:
A hierarchical encoding format for coding repairs to devices within a computing system. A device, such as a cache memory, is logically partitioned into a plurality of sub-portions. Various portions of the sub-portions are identifiable as different levels of hierarchy of the device. A first sub-portion may corresponds to a particular cache, a second sub-portion may correspond to a particular way of the cache, and so on. The encoding format comprises a series of bits with a first portion corresponding to a first level of the hierarchy, and a second portion of the bits corresponds to a second level of the hierarchy. Each of the first and second portions of bits are preceded by a different valued bit which serves to identify the hierarchy to which the following bits correspond. A sequence of repairs are encoded as string of bits. The bit which follows a complete repair encoding indicates whether a repair to the currently identified cache is indicated or whether a new cache is targeted by the following repair. Therefore, certain repairs may be encoded without respecifying the entire hierarchy.
摘要:
An apparatus and method are presented for programmable built-in self-test (BIST) and built-in self-repair (BISR) of an embedded memory (i.e., a memory formed with random logic upon a semiconductor substrate). A semiconductor device may include a memory unit, a BIST logic unit coupled to the memory unit, and a master test unit coupled to the BIST logic unit and the memory unit. The memory unit stores data input signals in response to a first set of address and control signals, and provides the stored data input signals as data output signals in response to a second set of address and control signals. The master test unit provides the memory test pattern to the BIST logic unit and generates the first and second sets of address and control signals. The BIST logic unit stores the memory test pattern, produces the data input signals dependent upon the memory test pattern, provides the data input signals to the memory unit, receives the data output signals from the memory unit, and compares the data output signals to the data input signals to form BIST results. The BIST system may perform a hardwired BIST routine when an asserted RESET signal is received by the semiconductor device and/or a programmable BIST routine under software control. The BIST logic unit may include a redundant memory structure, and may be configured to functionally replace a defective memory structure of the memory unit with one of the redundant memory structures dependent upon the BIST results.
摘要:
A branch prediction unit includes a local branch prediction and a global branch prediction. A global branch prediction utilizes a global history shift register to record the behavior of conditional branches. In some cases, a conditional branch may behave in a static manner, either always being taken or not taken, while resident in an instruction cache. Such static behaving conditional branches do not need a global history for prediction and contend with other conditional branches for global branch history training. By utilizing a dynamic branch classification scheme, branches requiring global history prediction can be identified and static behaving conditional branches may be prevented from polluting the global history. All conditional branches are initially classified as local and do not participate in global history training. Only after two mispredictions are branches recognized as exhibiting dynamic behavior and classified as global. These branches classified as global may then participate in global history training and utilize a global history based branch prediction.