Abstract:
A server includes a first module that receives information from a plurality of systems. Each system of the plurality of systems includes functional units that are dynamically configurable during operation of the system. The information from each system of the plurality of systems includes performance data collected while executing a program when the functional units are configured according to a configuration setting respective to the system. The server also includes a second module that analyzes the received information to select a best-performing configuration setting of the configuration settings received from the plurality of systems. The server also includes a third module that provides a new configuration setting to the plurality of systems. The new configuration setting is a modification of the best-performing configuration. The server iterates on receiving the information from the plurality of systems, analyzing the received information and providing the new configuration setting to the plurality of systems.
Abstract:
A microprocessor includes a translation lookaside buffer and a first request to load into the microprocessor a page table entry in response to a miss of a virtual address in the translation lookaside buffer. The requested page table entry is included in a page table. The page table encompasses a plurality of cache lines including a first cache line that includes the requested page table entry. The microprocessor also includes hardware logic that makes a determination whether a second cache line physically sequential to the first cache line is outside the page table, and a second request to prefetch the second cache line into the microprocessor. The second request is selectively generated based at least on the determination made by the hardware logic.
Abstract:
A microprocessor includes a plurality of processing cores each comprises a corresponding memory physically located inside the core and readable by the core but not readable by the other cores (“core memory”). The microprocessor also includes a memory physically located outside all of the cores and readable by all of the cores (“uncore memory”). For each core, the uncore memory and corresponding core memory collectively provide M words of storage for microcode instructions fetchable by the core as follows: the uncore memory provides J of the M words of microcode instruction storage, and the corresponding core memory provides K of the M words of microcode instruction storage. J, K and M are counting numbers, and M=J+K. The memories are non-architecturally-visible and accessed using a fetch address provided by a non-architectural program counter, and the microcode instructions are non-architectural instructions that implement architectural instructions.
Abstract:
A processor includes a decoder that decodes an instruction that instructs the processor to perform subsequent computations in an approximate manner and a functional unit that performs the subsequent computations in the approximate manner in response to the instruction. An instruction instructs the processor to clear an error amount associated with a value stored in a general purpose register of the processor. The error amount indicates an amount of error associated with a result of a computation performed by the processor in an approximate manner. The processor also clears the error amount in response to the instruction. Another instruction specifies a computation to be performed and includes a prefix that indicates the processor is to perform the computation in an approximate manner. The functional unit performs the computation specified by the instruction in the approximate manner specified by the prefix.
Abstract:
A microprocessor includes a first and second hardware data prefetchers configured to prefetch data into the microprocessor according to first and second respective algorithms, which are different. The second prefetcher is configured to detect a memory access pattern within a memory region and responsively prefetch data from the memory region according the second algorithm. The second prefetcher is further configured to provide to the first prefetcher a descriptor of the memory region. The first prefetcher is configured to stop prefetching data from the memory region in response to receiving the descriptor of the memory region from the second prefetcher. The second prefetcher also provides to the first prefetcher a communication to resume prefetching data from the memory region, such as when the second prefetcher subsequently detects that a predetermined number of memory accesses to the memory region are not in the memory access pattern.
Abstract:
A microprocessor includes a plurality of processing cores each comprises a corresponding memory physically located inside the core and readable by the core but not readable by the other cores (“core memory”). The microprocessor also includes a memory physically located outside all of the cores and readable by all of the cores (“uncore memory”). For each core, the uncore memory and corresponding core memory collectively provide M words of storage for microcode instructions fetchable by the core as follows: the uncore memory provides J of the M words of microcode instruction storage, and the corresponding core memory provides K of the M words of microcode instruction storage. J, K and M are counting numbers, and M=J+K. The memories are non-architecturally-visible and accessed using a fetch address provided by a non-architectural program counter, and the microcode instructions are non-architectural instructions that implement architectural instructions.
Abstract:
A microprocessor includes a first and second hardware data prefetchers configured to prefetch data into the microprocessor according to first and second respective algorithms, which are different. The second prefetcher is configured to detect a memory access pattern within a memory region and responsively prefetch data from the memory region according the second algorithm. The second prefetcher is further configured to provide to the first prefetcher a descriptor of the memory region. The first prefetcher is configured to stop prefetching data from the memory region in response to receiving the descriptor of the memory region from the second prefetcher. The second prefetcher also provides to the first prefetcher a communication to resume prefetching data from the memory region, such as when the second prefetcher subsequently detects that a predetermined number of memory accesses to the memory region are not in the memory access pattern.