摘要:
Transitions to ring 0, each time an application wants to use an adjunct processor, are avoided, saving central processor operating cycles and improving efficiency. Instead, initially each application is registered and setup to use adjunct processor resources in ring 3.
摘要:
A method and apparatus for reordering memory requests for page coherency. Various data streams are frequently found in separate areas of physical memory (i.e. each data stream is found in a separate memory “page”). Because these requests from different streams become intermixed, a certain amount of latency results from the resulting page “breaks.” These page breaks occur when consecutive requests are from different data streams, requiring accesses to different memory pages. When several separate streams of data are requested by a client, page coherency between requests diminishes. A reordering device regains lost page coherency, thereby reducing the amount of latency and increasing overall system performance.
摘要:
Methods and apparatus are disclosed for efficient TLB (translation look-aside buffer) shoot-downs for heterogeneous devices sharing virtual memory in a multi-core system. Embodiments of an apparatus for efficient TLB shoot-downs may include a TLB to store virtual address translation entries, and a memory management unit, coupled with the TLB, to maintain PASID (process address space identifier) state entries corresponding to the virtual address translation entries. The PASID state entries may include an active reference state and a lazy-invalidation state. The memory management unit may perform atomic modification of PASID state entries responsive to receiving PASID state update requests from devices in the multi-core system and read the lazy-invalidation state of the PASID state entries. The memory management unit may send PASID state update responses to the devices to synchronize TLB entries prior to activation responsive to the respective lazy-invalidation state.
摘要:
Mid-command buffer preemption is described for graphics workloads in a graphics processing environment. In one example, instructions of a first context are executed at a graphics processor, the first context has a sequence of instructions in an addressable buffer and at least one of the instructions is a preemption instruction. Upon executing the preemption instruction, execution of the first context is stopped before the sequence of instructions is completed. An address is stored for an instruction with which the first context will be resumed. The second context is executed, and upon completion of the execution of the second context, the execution of the first context is resumed at the stored address.
摘要:
A mechanism implemented by a controller enables efficient access to an interleaved memory system that includes M modules, M being (2n+1) or (2n−1), n being a positive integer number. Upon receiving an address N, the controller performs shift and add/subtract operations to obtain a quotient of N divided by M based on a binomial series expansion of N over M. The controller computes a remainder of N divided by M based on the quotient. The controller then accesses one of the modules in the memory based on the remainder.
摘要:
Technologies are presented that allow a portion of a cache to be used as a front memory when there is dynamic need based on system demand. A computing system may include at least one processor, a memory controlled by a controller and communicatively coupled with the at least one processor, a cache communicatively coupled with the at least one processor and the memory, and mapping logic communicatively coupled with the at least one processor, the memory, and the cache. The mapping logic may map a portion of the cache to a portion of the memory, wherein the portion of the cache is to be used by the at least one processor as a local memory, and wherein the mapping is dynamic based on system demand and managed by the controller in a physical address domain.
摘要:
Memory-based semaphore are described that are useful for synchronizing operations between different processing engines. In one example, operations include executing a context at a producer engine, the executing including updating a memory register, and sending a signal from the producer engine to a consumer engine that the memory register has been updated, the signal including a Context ID to identify a context to be executed by the consumer engine to update the register.
摘要:
Systems, apparatus and methods are described including distributing batches of geometric objects to a multi-core system, at each processor core, performing vertex processing and geometry setup processing on the corresponding batch of geometric objects, storing the vertex processing results shared memory accessible to all of the cores, and storing the geometry setup processing results in local storage. Each particular core may then perform rasterization using geometry setup results obtained from local storage within the particular core and from local storage of at least one of the other processor cores.
摘要:
Techniques are disclosed for processing rendering engine workload of a graphics system in a secure fashion, wherein at least some security processing of the workload is offloaded from software-based security parsing to hardware-based security parsing. In some embodiments, commands from a given application are received by a user mode driver (UMD), which is configured to generate a command buffer delineated into privileged and/or non-privileged command sections. The delineated command buffer can then be passed by the UMD to a kernel-mode driver (KMD), which is configured to parse and validate only privileged buffer sections, but to issue all other batch buffers with a privilege indicator set to non-privileged. A graphics processing unit (GPU) can receive the privilege-designated batch buffers from the KMD, and is configured to disallow execution of any privileged command from a non-privileged batch buffer, while any privileged commands from privileged batch buffers are unrestricted by the GPU
摘要:
A method and apparatus for creating, updating, and using guest physical address (GPA) to host physical address (HPA) shadow translation tables for translating GPAs of graphics data direct memory access (DMA) requests of a computing environment implementing a virtual machine monitor to support virtual machines. The requests may be sent through a render or display path of the computing environment from one or more virtual machines, transparently with respect to the virtual machine monitor. The creating, updating, and using may be performed by a memory controller detecting entries sent to existing global and page directory tables, forking off shadow table entries from the detected entries, and translating GPAs to HPAs for the shadow table entries.