Abstract:
A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).
Abstract:
A computer system manages the allocation of memory to an application program using a dependency tree. The dependency tree informs a memory manager of data inputs, data outputs, and intermediate values associated with execution of the application program. The memory manager allocates a single heap structure within a physical memory. Data associated with each node of the dependency tree is allocated to the heap structure so that data input values are allocated in a contiguous block, and intermediate values are allocated separately. In various examples, as execution of the application program proceeds, the separation of intermediate values from non-intermediate values within the heap reduces memory fragmentation providing improved performance of the computer system as a whole.
Abstract:
A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.
Abstract:
A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.
Abstract:
A system and method for optimizing multiple invocations of a graphics processing unit (GPU) program in Java. In one embodiment, the system includes: (1) a frontend component in a computer system and configured to compile Java bytecode associated with the a class object that implements a functional interface into Intermediate Representation (IR) code and store the IR code with the associated jogArray and (2) a collector/composer component in the computer system, associated with the frontend and configured to traverse a tree containing the multiple invocations from the result to collect the IR code and compose the IR code collected in the traversing into aggregate IR code when a result of the GPU program is explicitly requested to be transferred to a host.
Abstract:
System and method of compiling a program having a mixture of host code and device code to enable code coverage data collection for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: coverage instrumentation counters for the device functions; mapping information that maps the counters with the instrumented source points; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected code coverage information from the device memory to the host memory. Execution of the instrumented executable can yield a coverage report on the device code functions.
Abstract:
A system and method for optimizing multiple invocations of a graphics processing unit (GPU) program in Java. In one embodiment, the system includes: (1) a frontend component in a computer system and configured to compile Java bytecode associated with the a class object that implements a functional interface into Intermediate Representation (IR) code and store the IR code with the associated jogArray and (2) a collector/composer component in the computer system, associated with the frontend and configured to traverse a tree containing the multiple invocations from the result to collect the IR code and compose the IR code collected in the traversing into aggregate IR code when a result of the GPU program is explicitly requested to be transferred to a host.
Abstract:
A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).
Abstract:
A computer system manages the allocation of memory to an application program using a dependency tree. The dependency tree informs a memory manager of data inputs, data outputs, and intermediate values associated with execution of the application program. The memory manager allocates a single heap structure within a physical memory. Data associated with each node of the dependency tree is allocated to the heap structure so that data input values are allocated in a contiguous block, and intermediate values are allocated separately. In various examples, as execution of the application program proceeds, the separation of intermediate values from non-intermediate values within the heap reduces memory fragmentation providing improved performance of the computer system as a whole.
Abstract:
A computation graph is accessed. In the computation graph, operations to be performed are represented as interior nodes, inputs to the operations are represented as leaf nodes, and a result of the operations is represented as a root. Selected sets of the operations are combined to form respective kernels of operations. Code is generated execute the kernels of operations. The code is executed to determine the result.