摘要:
A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.
摘要:
A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.
摘要:
Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization. The method also can include selecting a trip count condition responsive to a scheduling policy set for the code compilation data processing system and also on a minimal simdizable threshold, determining a trip count and an alignment constraint for the selected loop, and generating a version of a parallel loop in the source code according to the alignment constraint and a comparison of the trip count to the trip count condition.
摘要:
Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization. The method also can include selecting a trip count condition responsive to a scheduling policy set for the code compilation data processing system and also on a minimal simdizable threshold, determining a trip count and an alignment constraint for the selected loop, and generating a version of a parallel loop in the source code according to the alignment constraint and a comparison of the trip count to the trip count condition.
摘要:
Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing segmentation offloading, such as TCP segmentation offloading (TSO). An interface performs direct virtual memory addressing of a user memory space of a system memory on behalf of a network processor to fetch payload data originated by a user process running on a host processor. Then, the network processor segments the payload data across one or more packets.
摘要:
A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.
摘要:
A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.
摘要:
A packet classification apparatus and method using field level tries includes a main processing part for generating and maintaining the field level tries, which organize a multi-field packet by field in a hierarchical structure for classifications; and classification engines, each of which is provided with a first classification part for performing queries and updates and processing a prefix lookup represented by an IP source/destination address lookup, and a second classification part for proceeding with classifications by corresponding field based on a result of the first classification part in order to process a range lookup belonging to the result. Accordingly, tries in the unit of a field are developed so that packet classifications for high-speed networking with excellent query performance are secured, and wherein approximately a half-million classifier rules can be processed.
摘要:
Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing transport layer protocol segmentation offloading. Multiple buffer descriptors are stored in a system memory of a network device. The buffer descriptors contain information indicative of a starting address of a payload buffer stored in a user memory space of the system memory. The payload buffers contain payload data originated by a user process running on a host processor of the network device. The payload data is retrieved from the payload buffers on behalf of a network processor of the network device without copying the payload data from the user memory space to a kernel memory space of the system memory by performing direct virtual memory addressing of the user memory space. Finally, the payload data is segmented across one or more transport layer protocol packets.
摘要:
Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing transport layer protocol segmentation offloading. Multiple buffer descriptors are stored in a system memory of a network device. The buffer descriptors contain information indicative of a starting address of a payload buffer stored in a user memory space of the system memory. The payload buffers contain payload data originated by a user process running on a host processor of the network device. The payload data is retrieved from the payload buffers on behalf of a network processor of the network device without copying the payload data from the user memory space to a kernel memory space of the system memory by performing direct virtual memory addressing of the user memory space. Finally, the payload data is segmented across one or more transport layer protocol packets.