Abstract:
A processing system includes a primary processor and a co-processor. The primary processor is couplable to a memory subsystem having at least one memory and operating to execute system software employing memory address translations based on one or more page tables stored in the memory subsystem. The co-processor is likewise couplable to the memory subsystem and operates to perform iterations of a page table walk through one or more page tables maintained for the memory subsystem and to perform one or more page management operations on behalf of the system software based the iterations of the page table walk. The page management operations performed by the co-processor include analytic data aggregation, free list management and page allocation, page migration management, page table error detection, and the like.
Abstract:
Systems, apparatuses, and methods for sharing an field programmable gate array compute engine are disclosed. A system includes one or more processors and one or more FPGAs. The system receives a request, generated by a first user process, to allocate a portion of processing resources on a first FPGA. The system maps the portion of processing resources of the first FPGA into an address space of the first user process. The system prevents other user processes from accessing the portion of processing resources of the first FPGA. Later, the system detects a release of the portion of the processing resources on the first FPGA by the first user process. Then, the system receives a second request to allocate the first FPGA from a second user process. In response to the second request, the system maps the first FPGA into an address space of the second user process.
Abstract:
Methods and apparatuses are provided for avoiding cold translation lookaside buffer (TLB) misses in a computer system. A typical system is configured as a heterogeneous computing system having at least one central processing unit (CPU) and one or more graphic processing units (GPUs) that share a common memory address space. Each processing unit (CPU and GPU) has an independent TLB. When offloading a task from a particular CPU to a particular GPU, translation information is sent along with the task assignment. The translation information allows the GPU to load the address translation data into the TLB associated with the one or more GPUs prior to executing the task. Preloading the TLB of the GPUs reduces or avoids cold TLB misses that could otherwise occur without the benefits offered by the present disclosure.
Abstract:
A device may receive a direct memory access request that identifies a virtual address. The device may determine whether the virtual address is within a particular range of virtual addresses. The device may selectively perform a first action or a second action based on determining whether the virtual address is within the particular range of virtual addresses. The first action may include causing a first address translation algorithm to be performed to translate the virtual address to a physical address associated with a memory device when the virtual address is not within the particular range of virtual addresses. The second action may include causing a second address translation algorithm to be performed to translate the virtual address to the physical address when the virtual address is within the particular range of virtual addresses. The second address translation algorithm may be different from the first address translation algorithm.
Abstract:
A page service request is received from a peripheral device requesting that a memory page be loaded into system memory. Page service request information corresponding to the received page service request is written as a queue entry into a queue structure in system memory. The processor is notified that the page request is present in the queue. The processor may be notified with an interrupt of a new queue entry. The processor processes the page service request and the peripheral device is notified of the completion of the processing of the request.
Abstract:
The described embodiments include a computing device with two or more translation lookaside buffers (TLB). During operation, the computing device updates an entry in the TLB based on a virtual address to physical address translation and metadata from a page table entry that were acquired during a page table walk. The computing device then computes, based on a lease length expression, a lease length for the entry in the TLB. Next, the computing device sets, for the entry in the TLB, a lease value to the lease length, wherein the lease value represents a time until a lease for the entry in the TLB expires, wherein the entry in the TLB is invalid when the associated lease has expired. The computing device then uses the lease value to control operations that are allowed to be performed using information from the entry in the TLB.
Abstract:
In a virtualized computer system without an IOMMU, all application IO requests must be processed by the guest operating system and by the hypervisor so that addresses are translated (twice) and validated (twice) properly. In a virtualized computer system with an IOMMU containing one "stage" of translation, the peripheral can safely be assigned directly to a guest OS because the IOMMU can be programmed to translate and check addresses issued by the device. As a result, route IO overhead due to hypervisor intervention can be eliminated. In one example, in a virtualized computer system with an IOMMU supporting two "stages" of translation, the peripheral can safely be assigned directly to an application within a guest OS. As a result, route IO overhead due to hypervisor and guest OS processing can be eliminated. This allows an application to achieve higher IO performance.
Abstract:
A macro scheduler includes a resource tracking module configured to update a database enumerating a plurality of macro components of a set of field programmable gate array (FPGA) devices, a communication interface configured to receive from a first client device a first design definition indicating one or more specified macro components for a design, resource allocation logic configured to allocate a first set of macro components for the design by allocating one of the plurality of macro components for each of the one or more specified macro components indicated in the first design definition, and configuration logic configured to implement the design in the set of FPGA devices by configuring the first set of allocated macro components according to the first design definition.
Abstract:
The described embodiments include an input-output memory management unit (IOMMU) with two or more memory elements and a controller. The controller is configured to select, based on one or more factors, one or more selected memory elements from among the two or more memory elements for performing virtual address to physical address translations in the IOMMU. The controller then performs the virtual address to physical address translations using the one or more selected memory elements.
Abstract:
A method includes updating contents of a value storage element indicating a number of occurrences of an event. The updating is based on contents of a match storage element storing event qualification information. The method includes providing the contents of the value storage element to a first software module executing on at least one processor. The providing is based on contents of a protect storage element indicating access information. In at least one embodiment, the method includes executing a first software module on the at least one processor in a first mode of operation. In at least one embodiment, the method includes executing a second software module on the at least one processor in a second mode of operation. In at least one embodiment, the second mode is more privileged than the first mode.