Abstract:
A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.
Abstract:
Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.
Abstract:
A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a plurality of processors. The method includes receiving a memory operation from a processor that receives a memory operation from a processor that references an address in a shared memory, mapping the received memory operation to at least one of a plurality of virtual memory pools to produce a mapping result, and providing the mapping result to the processor.
Abstract:
A register protection mechanism for a virtualized accelerated processing device (“APD”) is disclosed. The mechanism protects registers of the accelerated processing device designated as physical-function-or-virtual-function registers (“PF-or-VF* registers”), which are single architectural instance registers that are shared among different functions that share the APD in a virtualization scheme whereby each function can maintain a different value in these registers. The protection mechanism for these registers comprises comparing the function associated with the memory address specified by a particular register access request to the “currently active” function for the APD and disallowing the register access request if a match does not occur.
Abstract:
A technique for efficient time-division of resources in a virtualized accelerated processing device (“APD”) is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine (“VM”) and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed “early.” This virtualization context switch is “early” in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.
Abstract:
A device may receive a direct memory access request that identifies a virtual address. The device may determine whether the virtual address is within a particular range of virtual addresses. The device may selectively perform a first action or a second action based on determining whether the virtual address is within the particular range of virtual addresses. The first action may include causing a first address translation algorithm to be performed to translate the virtual address to a physical address associated with a memory device when the virtual address is not within the particular range of virtual addresses. The second action may include causing a second address translation algorithm to be performed to translate the virtual address to the physical address when the virtual address is within the particular range of virtual addresses. The second address translation algorithm may be different from the first address translation algorithm.
Abstract:
Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.
Abstract:
An apparatus includes a unified system/graphics memory and a memory controller. The memory controller is operative to receive client data access requests associated with one or more clients and a central processing unit (CPU) data access request associated with a CPU, to a plurality of memory channels for accessing the unified system/graphics memory. The memory controller is operative to provide access to the plurality of memory channels, in parallel, by the CPU and at least one client of the one or more clients. The memory controller is operative to prioritize the CPU data access request to the unified memory over the client data access requests to the unified memory and control the plurality of memory channels to access, in parallel, data for the CPU and data for the at least one client based on a request of the client data access requests and the CPU data access request.
Abstract:
A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.
Abstract:
A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.