Abstract:
A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.
Abstract:
A computer-implemented probabilistic pointer analysis method using SSA form comprises the steps of: evaluating a program in an SSA form comprising a target pointer to determine pointer relations between the target pointer, a plurality of aliased pointers related to the target pointer and at least a probable location of the target pointer; and generating a direct probabilistic relation between the target pointer and the at least a probable location of the target pointer according to the pointer relation.
Abstract:
The current disclosure discloses a power aware simulation system comprising an embedded multi-core simulation module, a power abstract interpretation module and a C power estimation (CPE) power profiling module. The embedded multi-core simulation module comprises a plurality of digital signal processors (DSP), an external memory and a direct memory access. Each of the plurality of DSPs comprises a DSP core, an instruction cache and a local memory. The power abstract interpretation module is coupled to the plurality of DSPs, the external memory, the DMA and the CPE profiling module, respectively.
Abstract:
A test method for a master-slave concurrent system running on a multicore processor includes the steps of establishing a PFA, otherwise called probabilistic finite automata, or probabilistic finite state machine, for a given regular expression; generating test patterns by running the PFA; splitting and merging the test patterns to generate an interleaved test pattern; and performing test on the master-slave system according to the interleaved test pattern. In an embodiment, the method further includes a step of debugging failures of the multicore processor during testing.
Abstract:
A method of streaming remote procedure invocation for multi-core systems to execute a transmitting thread and an aggregating thread of a multi-core system comprises the steps of: temporarily storing data to be transmitted; activating the aggregating thread if the amount of the temporarily stored data is equal to or greater than a threshold and the aggregating thread is at pause status; pausing the transmitting thread if there is no space to temporarily store the data to be transmitted; retrieving data to be aggregated; activating the transmitting thread if the amount of the data to be aggregated is less than a threshold and the transmitting thread is at pause status; and pausing the aggregating thread if there is no data to be retrieved.
Abstract:
A method for pipelining instructions on a PAC processor includes determining a minimum initial interval, and grouping the instructions so that the operands of dependent instructions are assigned to the same local register file. The virtual registers of the instructions that have data dependency across the first functional unit and the second functional unit are assigned to a global register file. The instructions are then modulo scheduled based on a current value of initial interval. The virtual registers of the scheduled instructions are allocated to the corresponding register files. If the allocation fails, a set of virtual registers is transferred from the first or second register file to the global register file.
Abstract:
A method of scheduling a plurality of instructions for a processor comprises the steps of: establishing a functional unit resource table comprising a plurality of columns, each of which corresponds to one of a plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a functional unit of the processor; establishing a ping-pong resource table comprising a plurality of columns, each of which corresponds to one of the plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a read port or a write port of a register bank of the processor; and allotting the plurality of instructions to the plurality of operation cycles of the processor and registering the functional units and the ports of the register banks corresponding to the allotted instructions on the functional unit resource table and the ping-pong resource table.
Abstract:
A method of allocating registers for a processor based on cycle information is disclosed. The processor comprises a first cluster and a second cluster. Each cluster comprises a first functional unit, a second functional unit, a first local register file connected to the first functional unit, a second local register file connected to the second register file, and a global register file having a ping-pong structure formed by a first register bank and a second register bank. After building a Component/Register Type Associated Data Dependency Graph (CRTA-DDG), a functional unit assignment, register file assignment, ping-pong register bank assignment, and cluster assignment are performed to take full advantage of the properties of a processor as well as cycle information.