摘要:
Embodiments of the present invention provide methods, systems, and computer readable media for input output memory management unit (IOMMU) two-layer addressing in the context of memory address translations for I/O devices. According to an embodiment, a method includes translating a guest virtual address (GVA) to a corresponding guest physical address (GPA) using a guest address translation table according to a process address space identifier associated with an address translation transaction associated with an I/O device, and translating the GPA to a corresponding system physical address (SPA) using a system address translation table according to a device identifier associated with the address translation transaction.
摘要:
A method, system, and computer program product are disclosed for providing improved access to accelerated processing device compute resources to user mode applications. The functionality disclosed allows user mode applications to provide commands to an accelerated processing device without the need for kernel mode transitions in order to access a unified ring buffer. Instead, applications are each provided with their own buffers, which the accelerated processing device hardware can access to process commands. With full operating system support, user mode applications are able to utilize the accelerated processing device in much the same way as a CPU.
摘要:
Disclosed herein are systems, apparatuses, and methods for enabling efficient reads to a local memory of a processing unit. In an embodiment, a processing unit includes an interface and a buffer. The interface is configured to (i) send a request for a portion of data in a region of a local memory of an other processing unit and (ii) receive, responsive to the request, all the data from the region. The buffer is configured to store the data from the region of the local memory of the other processing unit.
摘要:
In an embodiment, an input/output memory management unit (IOMMU) is configured to receive a completion wait command defined to ensure that one or more preceding invalidation commands are completed by the IOMMU prior to a completion of the completion wait command. The IOMMU is configured to respond to the completion wait command by delaying completion of the completion wait command until: (1) a read response corresponding to each outstanding memory read operation that depends on a translation entry that is invalidated by the preceding invalidation commands is received; and (2) the control unit transmits one or more operations upstream to ensure that each memory write operation that depends on the translation table entry that is invalidated by the preceding invalidation commands has at least reached a bridge to a coherent fabric in the computer system and has become visible to the system.
摘要:
Described are systems and methods for communication between a plurality of electronic devices and an aggregation device. An aggregation device processes instructions related to a configuration of an electronic device in communication with the aggregation device. One or more virtual devices are generated in response to processing the instructions. The electronic device enumerates a configuration space to determine devices for use by the electronic device. The aggregation device detects an access of the configuration space by the electronic device. The one or more virtual devices are presented from the aggregation device to the electronic device in accordance with the instructions.
摘要:
Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.
摘要:
In an embodiment, a computer system comprises a processor; a memory management module comprising a plurality of instructions executable on the processor; a memory coupled to the processor; and an input/output memory management unit (IOMMU) coupled to the memory. The IOMMU is configured to implement address translation and memory protection for memory operations sourced by one or more input/output (I/O) devices. The memory stores a command queue during use. The memory management module is configured to write one or more control commands to the command queue, and the IOMMU is configured to read the control commands from the command queue and execute the control commands.
摘要:
Described are a system and method for lossless message delivery between two processing devices. Each device includes a remote direct memory access (RDMA) messaging interface. The RDMA messaging interface at the first device generates one or more messages that are processed by the RDMA messaging interface of the second device. The RDMA messaging interface of the first device outputs a notification to the second device that a message of the one or more messages is available at the first device. A determination is made that the second device has resources to accommodate the message. The second device performs an operation in response to determining that the processing device has the resources to accommodate the message.
摘要:
A method, system, and computer program product are disclosed for providing improved access to accelerated processing device compute resources to user mode applications. The functionality disclosed allows user mode applications to provide commands to an accelerated processing device without the need for kernel mode transitions in order to access a unified ring buffer. Instead, applications are each provided with their own buffers, which the accelerated processing device hardware can access to process commands. With full operating system support, user mode applications are able to utilize the accelerated processing device in much the same way as a CPU.
摘要:
Described are systems and methods for interconnecting devices. A switch fabric is in communication with a plurality of electronic devices. A rendezvous memory is in communication with the switch fabric. Data is transferred to the rendezvous memory from a first electronic device of the plurality of electronic devices in response to a determination that the data is ready for output from a memory at the first electronic device and in response to a location allocated in the rendezvous memory for the data.