摘要:
A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.
摘要:
A data processing system is adapted to execute at least one workshare construct in a parallel region. The data processing system uses at least one thread for executing a corresponding subsection of the workshare construct and provides control blocks for managing corresponding workshare constructs in the parallel region. A method of managing the control blocks comprises: adding an array of control blocks to a control block queue; assigning control blocks in the initialized array to corresponding workshare constructs in the parallel region until a barrier is reached; and waiting at the barrier for all threads in the parallel region to complete their corresponding subsections and then resetting the control block to the beginning of the control block queue. Also provided are a computer program product and a data processing system for implementing the method.
摘要:
A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.
摘要:
A method and apparatus for emulating stream clock signal in asynchronous data transmission. The inventive subject matter proposes a system consisting of a transmitter module, a receiver module, and a link or network in between. A scheme to generate the emulated stream clock across a wide frequency range is also proposed with the property of controllable deviation from the original stream frequency to meet jitter requirement and fast frequency convergence (minimal number of converging steps). The scheme includes an optional first step to derive a frequency estimation of the stream clock and a second step of continuous adjusting the emulated clock frequency to keep the average frequency equals that of the original stream clock.
摘要:
A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.
摘要:
Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing segmentation offloading, such as TCP segmentation offloading (TSO). An interface performs direct virtual memory addressing of a user memory space of a system memory on behalf of a network processor to fetch payload data originated by a user process running on a host processor. Then, the network processor segments the payload data across one or more packets.
摘要:
A method for promotion of a child procedure in a software application for a heterogeneous architecture, wherein the heterogeneous architecture comprises a first architecture type and a second architecture type, comprises inserting a parameter representing a parallel frame pointer to a parent procedure of the child procedure into the child procedure; and modifying a reference in the child procedure to a stack variable of the parent procedure to include an indirect access to the parent procedure via the parallel frame pointer.
摘要:
The present disclosure is directed to a method for providing an OpenMP reduction implementation. The method may comprise creating an aggregate of at least one reduction variable in a parallel region or a work-sharing construct; defining a pointer variable, the pointer variable pointing to a dynamic array of the aggregate; creating an initialization routine, an outlined routine and a reduction accumulation routine; replacing the parallel region or the work-sharing construct with a runtime routine, the runtime routine taking a plurality of arguments including an address of the initialization routine, an address of the outlined routine, an address of the reduction accumulation routine, an address of the pointer variable, and a size of the aggregate; and executing the runtime routine when the at least one reduction variable is in the parallel region or the work-sharing construct.
摘要:
A method, system and apparatus for barrier synchronization using distributed counters and a centralized sensor. The system can include multiple distributed counters coupled to corresponding application processes in a computing application. The barrier synchronization system further can include a centralized sensor coupled for observation by the application processes. Preferably, the application processes can be separate threads of execution in the computing application. The barrier synchronization centralized sensor yet further can be managed by a designated master one of the application processes. Moreover, preferably the system further can include a backup sensor coupled for observation by the application processes and managed by the designated master one of the application processes.
摘要:
A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.