摘要:
An attribute group storage unit acquires and holds attribute groups set to respective data blocks. A scenario determination unit determines respective transfer systems of the respective blocks between a memory of the lowest hierarchy and a memory of another hierarchy based on those attribute groups and a configuration of an arithmetic unit which is the parallel processor, and controls the transfer of the respective data blocks according to the determined transfer systems, and the parallel arithmetic operation corresponding to the transfer. Each of the attribute groups is necessary to determine the transfer systems, and includes one or more attributes not depending on the configuration of the parallel processor. The attribute groups of the write blocks are set assuming that each of the write blocks has already been located in the memory of another hierarchy, and is transferred to the memory of the lowest hierarchy.
摘要:
A computing device-implemented method includes initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The method also includes transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The method further includes receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.
摘要:
One example method includes identifying synchronous code including instructions specifying a computing operation to be performed on a set of data; transforming the synchronous code into a pipeline application including one or more pipeline objects; identifying a first input data set on which to execute the pipeline application; executing the pipeline application on a first input data set to produce a first output data set; after executing the pipeline application on the first input data set, identifying a second input data set on which to execute the pipeline application; determining a set of differences between the first input data set and second input data set; and executing the pipeline application on the set of differences to produce a second output data set.
摘要:
A formalized method for mapping applications on a multiprocessor system is provided. In particular re-use possibilities are explored, e.g. focus on data transfer and memory access issues, with the aim of obtaining low-power and low-energy mappings and/or to overcome memory performance or energy bottlenecks.
摘要:
The invention relates to a method for optimising the parallel processing of data on a hardware platform comprising at least one calculation unit comprising a plurality of processing units capable of executing a plurality of executable tasks in parallel, wherein all the data to be processed is broken down into subsets of data, a same sequence of operations being carried out on each subset of data. The method of the invention comprises obtaining (50, 52) the maximum number of subsets of data to be processed by a same sequence of operations, and a maximum number of tasks that can be executed in parallel by a calculation unit of the hardware platform, determining (54) at least two processing partitions, each of said processing partitions corresponding to the partition of all the data into a number of data groups, and to the assignment of at least one executable task, capable of executing said sequence of operations, to each subset of data from said data group, and selecting (60, 62) the processing partition that makes it possible to obtain an optimal measurement value depending on a predetermined criterion. Programming code instructions implementing said selected processing partition are then obtained. One use of the method of the invention is the selection of an optimal hardware platform according to a measurement of execution performance.
摘要:
A high level programming language provides a tile communication operator that decomposes a computational space into sub-spaces (i.e., tiles) that may be mapped to execution structures (e.g., thread groups) of data parallel compute nodes. An indexable type with a rank and element type defines the computational space. For an input indexable type, the tile communication operator produces an output indexable type with the same rank as the input indexable type and an element type that is a tile of the input indexable type. The output indexable type provides a local view structure of the computational space that enables coalescing of global memory accesses in a data parallel compute node.
摘要:
A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.
摘要:
An information processing apparatus, among a plurality of information processing apparatuses, to which one of pieces of local data is assigned, the pieces of local data having been obtained by dividing global data shared by the plurality of information processing apparatuses, includes: a storage unit that includes a first storage area sectioned into prescribed units, and stores local data; a processor that executes a process including: detecting a plurality of continuous sections to which the target local data is to be written in a second storage area that is sectioned into the prescribed units in the different information processing apparatus, on the basis of storage area information that identifies data to which the target local data corresponds in the global data; and extracting as many pieces of local data as specified by the number of the continuous sections and transmitting the data to the different information processing apparatus.
摘要:
A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.