摘要:
A computing platform may include heterogeneous processors (e.g., CPU and a GPU) to support sharing of virtual functions between such processors. In one embodiment, a CPU side vtable pointer used to access a shared object from the CPU 110 may be used to determine a GPU vtable if a GPU-side table exists. In other embodiment, a shared non-coherent region, which may not maintain data consistency, may be created within the shared virtual memory. The CPU and the GPU side data stored within the shared non-coherent region may have a same address as seen from the CPU and the GPU side. However, the contents of the CPU-side data may be different from that of GPU-side data as shared virtual memory may not maintain coherency during the run-time. In one embodiment, the vptr may be modified to point to the CPU vtable and GPU vtable stored in the shared virtual memory.
摘要:
A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit (GPU) and a shared virtual memory supported by a physical private memory space of at least one heterogeneous processor or a physical shared memory shared by the heterogeneous processor. The CPU (producer) may create shared multi-version data and store such shared multi-version data in the physical private memory space or the physical shared memory. The GPU (consumer) may acquire or access the shared multi-version data.
摘要:
A page table entry dirty bit system may be utilized to record dirty information for a software distributed shared memory system. In some embodiments, this may improve performance without substantially increasing overhead because the dirty bit recording system is already available in certain processors. By providing extra bits, coherence can be obtained with respect to all the other uses of the existing page table entry dirty bits.
摘要:
A computing platform may include heterogeneous processors (e.g., CPU and a GPU) to support sharing of virtual functions between such processors. In one embodiment, a CPU side vtable pointer used to access a shared object from the CPU 110 may be used to determine a GPU vtable if a GPU-side table exists. In other embodiment, a shared non-coherent region, which may not maintain data consistency, may be created within the shared virtual memory. The CPU and the GPU side data stored within the shared non-coherent region may have a same address as seen from the CPU and the GPU side. However, the contents of the CPU-side data may be different from that of GPU-side data as shared virtual memory may not maintain coherency during the run-time. In one embodiment, the vptr may be modified to point to the CPU vtable and GPU vtable stored in the shared virtual memory.
摘要:
A page table entry dirty bit system may be utilized to record dirty information for a software distributed shared memory system. In some embodiments, this may improve performance without substantially increasing overhead because the dirty bit recording system is already available in certain processors. By providing extra bits, coherence can be obtained with respect to all the other uses of the existing page table entry dirty bits.
摘要:
A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU and a shared virtual memory supported by a physical private memory space of at least one heterogeneous processor or a physical shared memory shared by the heterogeneous processor. The CPU (producer) may create shared multi-version data and store such shared multi-version data in the physical private memory space or the physical shared memory. The GPU (consumer) may acquire or access the shared multi-version data.
摘要:
A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
摘要:
A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
摘要:
Embodiments of the invention provide language support for CPU-GPU platforms. In one embodiment, code can be flexibly executed on both the CPU and GPU. CPU code can offload a kernel to the GPU. That kernel may in turn call preexisting libraries on the CPU, or make other calls into CPU functions. This allows an application to be built without requiring the entire call chain to be recompiled. Additionally, in one embodiment data may be shared seamlessly between CPU and GPU. This includes sharing objects that may have virtual functions. Embodiments thus ensure the right virtual function gets invoked on the CPU or the GPU if a virtual function is called by either the CPU or GPU.
摘要:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.