摘要:
A method for supporting speculative execution includes designating operations as speculative or non-speculative, and then deferring exceptions generated by speculative operations while immediately reporting exceptions by non-speculative operations. If a speculative operation uses a result of a speculative operation that has generated an exception, the exception is propagated. Deferred exceptions are detected and reported using a check operation either incorporated into a non-speculative operation or inserted as a separate check operation. A system for supporting speculative execution includes a functional unit for recognizing a speculative operation and deferring any exceptions generated by such an operation. The functional unit may defer an exception by storing information indicating an error has occurred in the register file. To check for deferred exceptions, the functional unit then reads the register file. If an exception is detected, then the exception is processed and one or more of the speculative operation are re-executed (in a non-speculative mode) where necessary to process the exception.
摘要:
A method for supporting speculative execution includes designating operations as speculative or non-speculative, and then deferring exceptions generated by speculative operations while immediately reporting exceptions by non-speculative operations. If a speculative operation uses a result of a speculative operation that has generated an exception, the exception is propagated. Deferred exceptions are detected and reported using a check operation either incorporated into a non-speculative operation or inserted as a separate check operation. A system for supporting speculative execution includes a functional unit for recognizing a speculative operation and deferring any exceptions generated by such an operation. The functional unit may defer an exception by storing information indicating an error has occurred in the register file. To check for deferred exceptions, the functional unit then reads the register file.
摘要:
A memory processor which prevents errors when the compiler advances long latency load instructions in the instruction sequence to reduce the loss of efficiency resulting from the latency time. The memory processor intercepts all load and store instructions prior to the instructions entering the memory pipeline. The memory processor stores load instructions for a period of time sufficient to determine if any subsequent store instruction that would have been executed prior to the load instruction, had the load instruction not been moved, references the same address as that specified in the load instruction. If a store instruction references the load instruction address, the invention returns the same data as the load instruction would have if it was not moved by the compiler.
摘要:
A VLIW processor design system automates the design of programmable and non-programmable VLIW processors. The system takes as input an opcode repertoire, the I/O format of the opcodes, a register file specification, and instruction-level parallelism constraints. With this input specification, the system constructs a datapath, including functional units, register files and their interconnect components from a macrocell database. The system uses the input and the datapath to generate an instruction format design. The instruction format may then be used to construct the processor control path. The abstract input and datapath may be used to extract a machine description suitable to re-target a compiler to the processor. To optimize the processor for a particular application program, the system selects custom instruction templates based on operation issue statistics for the application program generated by the re-targeted compiler.
摘要:
A VLIW processor design system automates the design of programmable and non-programmable VLIW processors. The system takes as input an opcode repertoire, the I/O format of the opcodes, a register file specification, and instruction-level parallelism constraints. With this input specification, the system constructs a datapath, including functional units, register files and their interconnect components from a macrocell database. The system uses the input and the datapath to generate an instruction format design. The instruction format may then be used to construct the processor control path. The abstract input and datapath may be used to extract a machine description suitable to re-target a compiler to the processor. To optimize the processor for a particular application program, the system selects custom instruction templates based on operation issue statistics for the application program generated by the re-targeted compiler.
摘要:
A VLIW processor design system automates the design of programmable and non-programmable VLIW processors. The system takes as input an opcode repertoire, the I/O format of the opcodes, a register file specification, and instruction-level parallelism constraints. With this input specification, the system constructs a datapath, including functional units, register files and their interconnect components from a macrocell database. The system uses the input and the datapath to generate an instruction format design. The instruction format may then be used to construct the processor control path. The abstract input and datapath may be used to extract a machine description suitable to re-target a compiler to the processor. To optimize the processor for a particular application program, the system selects custom instruction templates based on operation issue statistics for the application program generated by the re-targeted compiler.
摘要:
The improved cache system reduces the effects of latency times by utilizing a preload instruction inserted by the compiler into the code. The preload instruction is sent sufficiently in advance of the corresponding load instruction to guarantee that the relevant data is in the cache memory when the load instruction is received. In addition, the invention prevents the pollution of the cache with data that will only be used once during the expected lifetime of the data in the cache. This second feature of the invention assures that a large number of references to data that will only be used once does not result in the contents of the cache being replaced with the subsequent need to reload the contents after the data references have been completed.
摘要:
An improved data processing system for executing branch instructions which has lower latency times and which only rarely requires the instruction pipeline to be flushed is disclosed. The data processing system utilizes a register file to hold the information needed to execute a branch instruction. The information is loaded into the register file in advance of the branch instruction. This allows the system to prepare more than one branch instruction at any given time. The present invention may be used to cause the cache line containing the target address of the branch instruction to be loaded soon as the target address is available for the branch instruction. Since the outcome of the branch instruction is almost always known when the branch instruction enters the instruction pipeline, the instruction pipeline only rarely needs to be flushed.
摘要:
A method and system are disclosed which allow a computer program to execute properly in object code compatible processing systems which have latencies different from those with which the program was created or compiled. This resulting compatibility of the computer program is achieved because the invention protects the precedence of operations within the computer program using latency assumptions which were used when creating the computer program. When the computer program is created, latency assumption information is efficiently provided within the computer program. Thereafter, when the computer program is executed, it is able to advise the processing system of the latency assumptions with which it was created. Various ways are described in which the processing system can utilize the latency assumptions when executing the computer program so as to ensure compatibility.
摘要:
An automated design system for VLIW processors explores a parameterized design space to assist in identifying candidate processor designs that satisfy desired design constraints, such as processor cost and performance. A VLIW synthesis process takes as input a specification of processor parameters and synthesizes a datapath specification, an instruction format design, and a control path specification. The synthesis process also extracts a machine description suitable to re-target a compiler. The re-targeted compiler generates operation issue statistics for an application program or set of programs. Using these statistics, a procedure for searching the design space can extract internal resources utilization information that is used to determine new candidate processors for evaluation.