Abstract:
Embodiments include systems and methods for optimization of micro-benchmark analysis for microprocessor designs. For example, embodiments seek to generate a suite of micro-benchmarks and associated weighting factors, which can be used to effectively define a weighted aggregate workload condition for a fine-grained (e.g., RTL) simulation in a manner that is a sufficient proxy for predicted commercial workload conditions. The suite of micro-benchmarks can be appreciably more efficient to simulate than the commercial workload, so that using the suite of micro-benchmarks as a proxy for the commercial workload can provide many benefits, including more efficient iterative design.
Abstract:
Techniques for high-performance parallel data sorting are provided. K, M, and N exceed 1. In a first phase, a plurality of unordered data elements to be sorted is divided into K unordered lists each preferably having approximately M elements. Each of these K unordered lists can be independently sorted in parallel using any algorithm, such as quicksort, to generate K ordered lists. In a second phase, N balanced workloads are determined from the K ordered lists by using an iterative converging process capped by a maximum number of iterations. Thus, any non-uniform or skewed data distribution can be load balanced with minimal processing time. Once the N balanced workloads are determined, they can be independently sorted in parallel, for example by using a merge sort, and then combined with a fast concatenation to provide the final sorted result. Thus, sorting operations are fully parallelized while avoiding expensive data scanning steps.
Abstract:
A method, apparatus, and system for improved high-performance parallel data sorting is provided. In a first phase, a plurality of unordered data elements to be sorted is divided into K unordered lists each preferably having approximately M elements. Each of these K unordered lists can be independently sorted in parallel using any algorithm, such as quicksort, to generate K ordered lists. In a second phase, N balanced workloads are determined from the K ordered lists by using an iterative converging process capped by a maximum number of iterations. Thus, any non-uniform or skewed data distribution can be load balanced with minimal processing time. Once the N balanced workloads are determined, they can be independently sorted in parallel, for example by using a merge sort, and then combined with a fast concatenation to provide the final sorted result. Thus, sorting operations are fully parallelized while avoiding any expensive data scanning steps.
Abstract:
Embodiments include systems and methods for optimization of micro-benchmark analysis for microprocessor designs. For example, embodiments seek to generate a suite of micro-benchmarks and associated weighting factors, which can be used to effectively define a weighted aggregate workload condition for a fine-grained (e.g., RTL) simulation in a manner that is a sufficient proxy for predicted commercial workload conditions. The suite of micro-benchmarks can be appreciably more efficient to simulate than the commercial workload, so that using the suite of micro-benchmarks as a proxy for the commercial workload can provide many benefits, including more efficient iterative design.
Abstract:
Embodiments include systems and methods for optimization of micro-benchmark analysis for microprocessor designs. For example, embodiments seek to generate a suite of micro-benchmarks and associated weighting factors, which can be used to effectively define a weighted aggregate workload condition for a fine-grained (e.g., RTL) simulation in a manner that is a sufficient proxy for predicted commercial workload conditions. The suite of micro-benchmarks can be appreciably more efficient to simulate than the commercial workload, so that using the suite of micro-benchmarks as a proxy for the commercial workload can provide many benefits, including more efficient iterative design.
Abstract:
Embodiments include systems and methods for optimization of micro-benchmark analysis for microprocessor designs. For example, embodiments seek to generate a suite of micro-benchmarks and associated weighting factors, which can be used to effectively define a weighted aggregate workload condition for a fine-grained (e.g., RTL) simulation in a manner that is a sufficient proxy for predicted commercial workload conditions. The suite of micro-benchmarks can be appreciably more efficient to simulate than the commercial workload, so that using the suite of micro-benchmarks as a proxy for the commercial workload can provide many benefits, including more efficient iterative design.
Abstract:
Embodiments of the invention provide adaptive power ramp control (APRC) in microprocessors. One implementation of the APRC can compute a present core power and a present power ramp condition in the microprocessor, for example, to determine whether the present power is in a particular predefined control zone and whether the present power ramp is greater than a predefined threshold for that control zone. Those determinations can indicate a likelihood of an imminent, undesirable power ramp condition and can inform entry into a control mode. The APRC can generate an appropriate stall control signal in response to its present control mode, and the stall control signal can stall operation of at least one functional unit of the microprocessor according to a predefined stall pattern. This can effectively combat the imminent power ramp condition by reducing the power usage of the microprocessor.
Abstract:
Embodiments for a processor that selectively enables and disables branch prediction are disclosed. The processor may include counters to track a number of fetched instructions, a number of branches, and a number of mispredicted branches. A misprediction threshold may be calculated dependent upon the tracked number of branches and a predefined misprediction ratio. Branch prediction may then be disabled when the number of mispredictions exceed the determined threshold value and dependent upon the branch rate.
Abstract:
Embodiments for a processor that selectively enables and disables branch prediction are disclosed. The processor may include counters to track a number of fetched instructions, a number of branches, and a number of mispredicted branches. A misprediction threshold may be calculated dependent upon the tracked number of branches and a predefined misprediction ratio. Branch prediction may then be disabled when the number of mispredictions exceed the determined threshold value and dependent upon the branch rate.
Abstract:
Embodiments of the invention provide adaptive power ramp control (APRC) in microprocessors. One implementation of the APRC can compute a present core power and a present power ramp condition in the microprocessor, for example, to determine whether the present power is in a particular predefined control zone and whether the present power ramp is greater than a predefined threshold for that control zone. Those determinations can indicate a likelihood of an imminent, undesirable power ramp condition and can inform entry into a control mode. The APRC can generate an appropriate stall control signal in response to its present control mode, and the stall control signal can stall operation of at least one functional unit of the microprocessor according to a predefined stall pattern. This can effectively combat the imminent power ramp condition by reducing the power usage of the microprocessor.