摘要:
A profiler which provides information to optimize an application specific architecture processor and a program for the processor is provided. The profiler includes: an architecture analyzer which analyzes an architecture description, and generates architecture analysis information, the architecture description describing an architecture of an application specific architecture processor which comprises a plurality of processing elements; a static analyzer which analyzes program static information that describes static information of a program, and generates static analysis information; a dynamic analyzer which analyzes program dynamic information that describes dynamic information of the program, and generates dynamic analysis information, the dynamic information of the program being generated by simulating the program; and a cross profiling analyzer which generates information for optimizing the application specific architecture processor to implement the program based on at least one of the architecture analysis information, the static analysis information, and the dynamic analysis information.
摘要:
A profiler which provides information to optimize an application specific architecture processor and a program for the processor is provided. The profiler includes: an architecture analyzer which analyzes an architecture description, and generates architecture analysis information, the architecture description describing an architecture of an application specific architecture processor which comprises a plurality of processing elements; a static analyzer which analyzes program static information that describes static information of a program, and generates static analysis information; a dynamic analyzer which analyzes program dynamic information that describes dynamic information of the program, and generates dynamic analysis information, the dynamic information of the program being generated by simulating the program; and a cross profiling analyzer which generates information for optimizing the application specific architecture processor to implement the program based on at least one of the architecture analysis information, the static analysis information, and the dynamic analysis information.
摘要:
Provided are a loop accelerator and a data processing system having the loop accelerator. The data processing system includes a loop accelerator which executes a loop part of a program, a processor core which processes a remaining part of the program except the loop part, and a central register file which transmits data between the processor core and the loop accelerator. The loop accelerator includes a plurality of processing elements (PEs) each of which performs an operation on each word to execute the program, a configuration memory which stores configuration bits indicating operations, states, etc. of the PEs, and a plurality of context memories, installed in a column or row direction of the PEs, which transmit the configuration bits along a direction toward which the PEs are arrayed. Thus, a connection structure between the configuration memory and the PEs can be simplified to easily modify a structure of the loop accelerator so as to extend the loop accelerator.
摘要:
A reconfigurable processor comprising a configuration memory for storing a configuration bit for at least one loop configuration; a valid information memory for storing bit information indicating whether an operation in a loop is a delay operation; and at least one processing unit for determining whether an operation in a next cycle is the delay operation by referring to the bit information transmitted from the valid information memory, and selectively performing a change and an implementation of a configuration according to the configuration bit from the configuration memory based on the determined results.
摘要:
A memory access method includes: obtaining a, b, and c from a program code for accessing a memory with a triple loop in a program, a being a number of values which an inner-most loop variable of the triple loop may have, b being a number of values which a middle loop variable of the triple loop may have, and c being a number of values which an outer-most loop variable of the triple loop may have; obtaining a starting address of the memory accessed by the triple loop; and obtaining an a×b×c number of addresses of the memory accessed by the triple loop using the starting address and a function.
摘要:
A reconfigurable processor comprising a configuration memory for storing a configuration bit for at least one loop configuration; a valid information memory for storing bit information indicating whether an operation in a loop is a delay operation; and at least one processing unit for determining whether an operation in a next cycle is the delay operation by referring to the bit information transmitted from the valid information memory, and selectively performing a change and an implementation of a configuration according to the configuration bit from the configuration memory based on the determined results.
摘要:
Provided is a processor and method of performing speculative load instructions of the processor in which a load instruction is performed only in the case where the load instruction substantially accesses a memory. A load instruction for canceling operations is performed in other cases except the above case, so that problems occurring by accessing an input/output (I/O) mapped memory area and the like at the time of performing speculative load instructions can be prevented using only a software-like method, thereby improving the performance of a processor.
摘要:
Provided is a processor and method of performing speculative load instructions of the processor in which a load instruction is performed only in the case where the load instruction substantially accesses a memory. A load instruction for canceling operations is performed in other cases except the above case, so that problems occurring by accessing an input/output (I/O) mapped memory area and the like at the time of performing speculative load instructions can be prevented using only a software-like method, thereby improving the performance of a processor.
摘要:
A loop coalescing method and a loop coalescing device are disclosed. The loop coalescing method comprises removing an inner-most loop from among nested loops, so that an outer operation provided outside of the inner-most loop is performed when a condition of a conditional statement is satisfied, generating a guard code by applying an if-conversion method to the conditional statement, and converting a guard by using an instruction calculating the guard of the guard code, the instruction calculating the guard using a register where information related to a period of time corresponding to the number of iterations of the inner-most loop is stored.
摘要:
An apparatus and a method are provided for a parallel processing very long instruction word (VLIW) computer. The apparatus includes: an index code generation unit sequentially generating an index code, which is associated with a number of no operation (NOP) instruction word between effective instruction words, with respect to each of instruction word groups to be executed in a VLIW computer; an instruction compression unit sequentially deleting the NOP instruction word which corresponds to the index code with respect to each of instruction word groups; and an instruction word conversion unit converting the effective instruction words to include the index code, the effective instruction words corresponding to the NOP instruction words.