摘要:
Disclosed herein are systems, apparatuses, and methods for enabling efficient reads to a local memory of a processing unit. In an embodiment, a processing unit includes an interface and a buffer. The interface is configured to (i) send a request for a portion of data in a region of a local memory of an other processing unit and (ii) receive, responsive to the request, all the data from the region. The buffer is configured to store the data from the region of the local memory of the other processing unit.
摘要:
Disclosed herein are systems, apparatuses, and methods for enabling efficient reads to a local memory of a processing unit. In an embodiment, a processing unit includes an interface and a buffer. The interface is configured to (i) send a request for a portion of data in a region of a local memory of an other processing unit and (ii) receive, responsive to the request, all the data from the region. The buffer is configured to store the data from the region of the local memory of the other processing unit.
摘要:
A method, system, and computer program product are disclosed for providing improved access to accelerated processing device compute resources to user mode applications. The functionality disclosed allows user mode applications to provide commands to an accelerated processing device without the need for kernel mode transitions in order to access a unified ring buffer. Instead, applications are each provided with their own buffers, which the accelerated processing device hardware can access to process commands. With full operating system support, user mode applications are able to utilize the accelerated processing device in much the same way as a CPU.
摘要:
A method, system, and computer program product are disclosed for providing improved access to accelerated processing device compute resources to user mode applications. The functionality disclosed allows user mode applications to provide commands to an accelerated processing device without the need for kernel mode transitions in order to access a unified ring buffer. Instead, applications are each provided with their own buffers, which the accelerated processing device hardware can access to process commands. With full operating system support, user mode applications are able to utilize the accelerated processing device in much the same way as a CPU.
摘要:
A method, system, and computer program product are disclosed for providing improved access to accelerated processing device compute resources to user mode applications. The functionality disclosed allows user mode applications to provide commands to an accelerated processing device without the need for kernel mode transitions in order to access a unified ring buffer. Instead, applications are each provided with their own buffers, which the accelerated processing device hardware can access to process commands. With full operating system support, user mode applications are able to utilize the accelerated processing device in much the same way as a CPU.
摘要:
A method, computer program product, and system that includes a virtual function module with an emulated display timing device, a first independent resource, and a second independent resource, where the first and second independent resources signal a physical function module that a new surface has been rendered, and where the physical function module signals the virtual function module via the emulated timing device and the first and second independent resources when the rendered new surface has been displayed, copied, used, or otherwise consumed.
摘要:
Methods and apparatus are provided, as an aspect of a combined CPU/APD architecture system, for discovering and reporting properties of devices and system topology that are relevant to efficiently scheduling and distributing computational tasks to the various computational resources of a combined CPU/APD architecture system. The combined CPU/APD architecture unifies CPUs and APDs in a flexible computing environment. In some embodiments, the combined CPU/APD architecture capabilities are implemented in a single integrated circuit, elements of which can include one or more CPU cores and one or more APD cores. The combined CPU/APD architecture creates a foundation upon which existing and new programming frameworks, languages, and tools can be constructed.
摘要:
A method for executing processes within a computer system is provided. The method includes determining when to switch from a first process, executing within the computer system, to executing another process. Execution of the first process corresponds to a computer system storage location. The method also includes switching to executing the other process based upon a time quantum and resuming execution of the first process after the time quantum has lapsed, the resuming corresponding to the storage location.
摘要:
In a CPU of the combined CPU/APD architecture system, the CPU having multiple CPU cores, each core having a first machine specific register for receiving a physical page table/page directory base address, a second machine specific register for receiving a physical address pointing to a location controlled by an IOMMUv2 that is communicatively coupled to an APD, and microcode which when executed causes a write notification to be issued to the physical address contained in the second machine specific register; receiving in the first machine specific register of a CPU core, a physical page table/page directory base address, receiving in the second machine specific register of the CPU core, a physical address pointing to a location controlled by the IOMMUv2, determining that a control register of the CPU core has been updated, and responsive to the determination that the control register has been updated, executing microcode in the CPU core that causes a write notification to be issued to the physical address contained in the second machine specific register, wherein the physical address is able to receive writes that affect IOMMUv2 page table invalidations.
摘要:
In a CPU, the CPU having multiple CPU cores, each core having a first machine specific register, a second machine specific register, and microcode which when executed causes a write notification to be issued to the physical address contained in the second machine specific register; receiving in the first machine specific register of a CPU core, a physical page table/page directory base address, receiving in the second machine specific register of the CPU core, a physical address pointing to a location controlled by the IOMMUv2, determining that a control register of the CPU core has been updated, and responsive to the determination that the control register has been updated, executing microcode in the CPU core that causes a write notification to be issued to the physical address contained in the second machine specific register, wherein the physical address is able to receive writes that affect IOMMUv2 page table invalidations.