摘要:
In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.
摘要:
In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.
摘要:
In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.
摘要:
A method of prefetching data in a microprocessor includes identifying a data stream associated with a process and determining a depth associated with the data stream based upon prefetch factors including the number of currently concurrent data streams and data consumption rates associated with the concurrent data streams. Data prefetch requests are allocated with the data stream to reflect the determined depth of the data stream. Allocating data prefetch requests may include allocating prefetch requests for a number of cache lines away from the cache line currently being referenced, wherein the number of cache lines is equal to the determined depth. The method may include, responsive to determining the depth associated with a data stream, configuring prefetch hardware to reflect the determined depth for the identified data stream. Prefetch control bits in an instruction executed by the processor control the prefetch hardware configuration.
摘要:
A method of prefetching data in a microprocessor includes identifying a data stream associated with a process and determining a depth associated with the data stream based upon prefetch factors including the number of currently concurrent data streams and data consumption rates associated with the concurrent data streams. Data prefetch requests are allocated with the data stream to reflect the determined depth of the data stream. Allocating data prefetch requests may include allocating prefetch requests for a number of cache lines away from the cache line currently being referenced, wherein the number of cache lines is equal to the determined depth. The method may include, responsive to determining the depth associated with a data stream, configuring prefetch hardware to reflect the determined depth for the identified data stream. Prefetch control bits in an instruction executed by the processor control the prefetch hardware configuration.
摘要:
An information handling system includes a processor that may perform issue queue virtual load/store instruction operations. The issue queue maintains load and store instructions with a real/virtual dependency flag. The issue queue provides storage resources for real and virtual load/store instructions. Real load/store instructions execute in a load store unit LSU. Virtual load/store instructions are pending execution in the LSU. The LSU may keep track of each virtual load/store instruction within the issue queue by thread, type, and pointer data. Provided that all dependencies are clear for a pending virtual load/store instruction, the LSU marks the pending virtual load/store instruction as real. The pending virtual load/store instruction may then issue to the LSU as a real load/store instruction.
摘要:
An information handling system includes a processor that may perform issue queue virtual load/store instruction operations. The issue queue maintains load and store instructions with a real/virtual dependency flag. The issue queue provides storage resources for real and virtual load/store instructions. Real load/store instructions execute in a load store unit LSU. Virtual load/store instructions are pending execution in the LSU. The LSU may keep track of each virtual load/store instruction within the issue queue by thread, type, and pointer data. Provided that all dependencies are clear for a pending virtual load/store instruction, the LSU marks the pending virtual load/store instruction as real. The pending virtual load/store instruction may then issue to the LSU as a real load/store instruction.
摘要:
A hardware design technique allows checking of design system language (DSL) specification of an element and schematics of large macros with embedded arrays and registers. The hardware organization reduces CPU time for logical verification by exponential order of magnitude without blowing up a verification process or logic simulation. The hardware organization consists of horizontal word level rather than bit level. Using the elimination process for elements which are difficult to be extracted in Boolean form the logic around and inside a memory structure can be verified. The resultant register array hardware organization can be verified to all pins and nets up to the storage element.
摘要:
A circuit and method provide rename register reallocation for simultaneous multi-threaded (SMT) processors that redistributes rename (mapped) resources between one thread during single-threaded (ST) execution and multiple threads during multi-threaded execution. The processor receives an instruction specifying a transition from a single-threaded to a multi-threaded mode or vice-versa and halts execution of all threads executing on the processor. The internal control logic then signals the resources to reallocate the resources. Rename resources are reallocated by directing an action at the rename mapper. When switching from SMT to ST mode, the mapper is directed to drop entries for the dying thread, but on a switch from ST to SMT mode, “dummy” instruction group dispatch indications are sent to the mapper that indicate use of all architected registers for each thread.
摘要:
A hardware design technique allows checking of design system language (DSL) specification of an element and schematics of large macros with embedded arrays and registers. The hardware organization reduces CPU time for logical verification by exponential order of magnitude without blowing up a verification process or logic simulation. The hardware organization consists of horizontal word level rather than bit level. A memory array cell comprises a pair of cross-coupled inverters forming a first latch for storing data. The first latch has an output connected to a read bit line. True and complement write word and bit line input to the first latch. A first set of pass gates connects between the true and complement write word and bit line inputs via gates and the input of said first latch. The first set of pass gates is responsive to a first clock via a second pass gate. A pair of cross-coupled inverters forms a second latch of a Level Sensitive Scan Design (LSSD). The second latch has output connected to an LSSD output for design verification. A second pass gate connects between the output of the first set of pass gates and the input of said first latch. The second pass gate is responsive to said first clock. A third pass gate connects between the output of said first latch and the input of said second latch. The third pass gate is responsive to a second clock. The first and second clocks are responsive to a black boxing process for incremental verification.