摘要:
A performance scaling device, a processor having the same, and a performance scaling method thereof are provided. The performance scaling device includes an adaptive voltage scaling unit, a latency prediction unit, and a variable-latency datapath. The adaptive voltage scaling unit generates a plurality of operation voltages and transmits the operation voltages to the variable-latency datapath. The variable-latency datapath operates with different latencies according to the operation voltages and generates an operation latency. The latency prediction unit receives the operation latency and a system latency tolerance and generates a voltage scaling signal for the adaptive voltage scaling unit according to the operation latency and the system latency tolerance. The adaptive voltage scaling unit outputs and scales the operation voltages thereof according to the voltage scaling signal.
摘要:
A performance scaling device, a processor having the same, and a performance scaling method thereof are provided. The performance scaling device includes an adaptive voltage scaling unit, a latency prediction unit, and a variable-latency datapath. The adaptive voltage scaling unit generates a plurality of operation voltages and transmits the operation voltages to the variable-latency datapath. The variable-latency datapath operates with different latencies according to the operation voltages and generates an operation latency. The latency prediction unit receives the operation latency and a system latency tolerance and generates a voltage scaling signal for the adaptive voltage scaling unit according to the operation latency and the system latency tolerance. The adaptive voltage scaling unit outputs and scales the operation voltages thereof according to the voltage scaling signal.
摘要:
An embodiment of a processing device includes a function unit and a control unit. The function unit receives input data and performs a specific operation to the input data to generate result data. The control unit receives the result data and generates an output signal. The control unit latches the result data according to a first clock signal to generate first data and latches the result data according to a second clock signal to generate second data. The control unit compares the first data with the second data to generate a control signal and selects the first data or the second data to serve as data of the output signal according to the control signal. The second clock signal is delayed from the first clock signal by a predefined time period.
摘要:
A method and corresponding apparatus for compiling high-level languages into specific processor architectures are provided. In this embodiment, the specific processor is encapsulated in a virtual processor interface with simple instruction set architecture, and a compiler translates application programs into corresponding assembly codes. Further, the difficulty of the compiler design is reduced.
摘要:
The present invention is a method for inter-cluster communication that employs register permutation by dynamically mapping the registers to the functional units. Because only the mapping between registers and functional units is changed and no actual data movement occurs, the present invention greatly diminishes the power consumption. Owing to the inter-cluster communication mechanism, a centralized register file can be replaced with small register sub-blocks, where the silicon area is greatly reduced, and the access time and the power consumption are also diminished.
摘要:
A pipelined datapath with dynamically reconfigurable pipeline stages is provided, having a pipeline controller which generates clock signals and selects signals based on a system clock and a valid data signal to control each of the registers and each of the multiplexers in the pipeline circuit. In other words, when a valid datum is being processed, the pipeline register is activated to latch the output of the combinational logic circuit; otherwise, when an invalid datum is received, the register is not activated and the datum bypasses the register through a multiplexer. Therefore, the pipeline stages of the pipelined datapath are dynamically reconfigured to save the power dissipation effectively.
摘要:
A pipelined datapath with dynamically reconfigurable pipeline stages is provided, having a pipeline controller which generates clock signals and selects signals based on a system clock and a valid data signal to control each of the registers and each of the multiplexers in the pipeline circuit. In other words, when a valid datum is being processed, the pipeline register is activated to latch the output of the combinational logic circuit; otherwise, when an invalid datum is received, the register is not activated and the datum bypasses the register through a multiplexer. Therefore, the pipeline stages of the pipelined datapath are dynamically reconfigured to save the power dissipation effectively.
摘要:
An inter-cluster communication module using the memory access network is provided, including a plurality of clusters, a memory subsystem, a controller and a switch device. When some clusters issue a load instruction and some clusters issue a store instruction of an identical memory address concurrently, the controller controls the switch device which connects the clusters and the memory banks of the memory subsystem, so that the data item is transmitted from the cluster issuing the store instruction to the cluster issuing the load instruction through the switch device, thereby achieving data exchange between the clusters. Herein, the data item is selectively stored in the memory module depending on the address. Furthermore, the data item is also transmitted between the memory and the clusters over the switch device.
摘要:
Disclosed is a virtual cluster architecture and method. The virtual cluster architecture includes N virtual clusters, N register files, M sets of function units, a virtual cluster control switch, and an inter-cluster communication mechanism. This invention uses a way of time sharing or time multiplexing to alternatively execute a single program thread across multiple parallel clusters. It minimizes the hardware resources for complicated forwarding circuitry or bypassing mechanism by greatly increasing the tolerance of instruction latency in the datapath. This invention may distribute function units serially into pipeline stages to support composite instructions. The performance and the code sizes of application programs can therefore be significantly improved with these composite instructions, of which the introduced latency can be completely hidden in this invention. This invention also has the advantage of being compatible with the program codes developed on conventional multi-cluster architectures.
摘要:
A unified single-core and multi-mode processor and its program execution method are provided. In an embodiment of this processor, a single instruction stream is different types of instructions randomly arranged in thereof. The processor switches its modes based on the type of a fetched instruction to execute the program corresponding to the fetched instruction.