摘要:
A method for recognizing a parallel executable region within a sequence of source code. The parallel executable region is identified by locating a loop structure within the sequence of source code. The loop structure includes a field controlled by an induction variable. Furthermore, the field is set in every iteration of the loop structure.
摘要:
A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
摘要:
In some embodiments, a data structure may be received in a first processing system. The data structure may represent a plurality of instructions for a second processing system. For at least one instruction of the plurality of instructions, a determination may be made as to whether the instruction can be replaced by a compact instruction for the second processing system. A compact instruction may be generated if the instruction can be replaced by a compact instruction. In some embodiments, an instruction may be received in a processing system. A determination may be made as to whether the instruction is a compact instruction. A decompacted instruction may be generated if the instruction is a compact instruction.
摘要:
An apparatus and method for vectorization of detected saturation and clipping operations in serial code loops of a source program are described. In one embodiment, the method includes the analysis of source program code to identify source code utilizing conditional constructs to perform saturation/clipping operations. Once analysis is complete, identified source code is vectorized to implement identified saturation/clipping operations utilizing single instruction, multiple data (SIMD) saturation/clipping instructions. Accordingly, utilizing embodiments of the present invention, conditional statements utilized to implement saturation arithmetic, as well as clipping of data values, such as pixel values within graphics applications, are replaced with SIMD saturation arithmetic instructions, as well as clipping instructions.
摘要:
A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
摘要:
A method and apparatus for finding loop_level parallelism in a sequence of instructions. In one embodiment, the method includes the steps of determining if a variable which identifies a memory address of a data structure is an induction variable; and determining if execution of the sequence of instructions terminates in response to a comparison of the variable to an invariant value. If the two conditions of the present invention are found to be true, the respective sequence of instructions is a candidate to be flagged for multi-threading execution, assuming the loop of the instructions terminates.