摘要:
A mechanism for allowing a single physical IB node to virtualize a plurality of host channel adapters is provided. This includes providing the appearance of both a router and multiple virtual HCA's residing behind that router, to the external REAL subnet components. Each virtual host channel adapter will have unique access control levels. One or more InfiniBand subnets are virtualized in such a way that nodes residing both within the virtual subnets and in separate physical subnets are completely unaware of the virtualization. This virtualization of InfiniBand subnets significantly increases the horizontal scaling capabilities of a single InfiniBand physical component, while at the same time provides “native” network throughput for all the virtual hosts.
摘要:
A distributed computing system having (host and I/O) end nodes, switches, routers, and links interconnecting these components is provided. The end nodes use send and receive queue pairs to transmit and receive messages. The end nodes use completion queues to inform the end user when a message has been completely sent or received and whether an error occurred during the message transmission or reception process. A mechanism implements these queue pairs and completion queues in hardware. A mechanism for controlling the transfer of work requests from the consumer to the CA hardware and work completions from the CA hardware to the consumer using head and tail pointers that reference circular buffers is also provided. The QPs and CQs do not contain Work Queue Entries and Completion Queue Entries respectively, but instead contain references to these entries. This allows them to be efficient and constant in size, while the Work Queue Entries and Completion Queue Entries themselves can vary in size, for example to include a variable number of data segments. Additionally, several mechanisms are provided to improve the overall efficiency of this process under different memory configurations.
摘要:
An apparatus and method for managing work and completion queues using head and tail circular pointers. With the apparatus and method, queue head and tail pointers are maintained in the channel interface and the host channel adapter. The head and tail pointers in the host channel adapter include a queue pointer table index and a queue page index for identifying a position within the queue. For work queues, the tail pointer in the channel interface is used to identify a next position where a work queue entry may be written. The head pointer in the channel interface is used only to determine whether the work queue is full or not. The head pointer in the host channel adapter is used to identify a next work queue entry for processing by the host channel adapter. The tail pointer in the host channel adapter is used by the host channel adapter to determine if the queue is empty. For completion queues, the head pointer in the channel interface is used to identify a next completion queue entry to be processed. The tail pointer in the host channel adapter is used to identify a next position in the completion queue to which the host channel adapter may post a completion queue entry.
摘要:
An apparatus and method for managing reliable datagram work queues, and associated completion queues, using head and tail pointers with end-to-end context error cache are provided. Reliable datagram (RD) queue head and tail pointers are maintained in the channel interface and the host channel adapter. The head and tail pointers in the host channel adapter include a RD queue page table index and a RD queue page index for identifying a position within the RD queue. For RD work queues, in the channel interface, the tail pointer is used to identify a next position where a work queue entry may be written and the head pointer is used only to determine whether the work queue is full. In the host channel adapter, the head pointer is used to identify a next work queue entry for processing and the tail pointer is used to determine if the queue is empty.
摘要:
An apparatus and method for swapping out real memory by inhibiting input/output (I/O) operations to a memory region are provided. The apparatus and method provide a mechanism in which a quiesce indicator is provided in a field containing the current outstanding I/O count associated with the memory region whose real memory is to be swapped out. The current I/O field and the quiesce indicator are used as a means for communicating between a shared resource arbitrator and a guest consumer. When the quiesce indicator is set, the guest consumer is informed that it should not send any further I/O operations to that memory region. When the number of pending I/O operations against the memory region is zero, a valid bit in a protection table is set to invalid, and the real memory associated with the memory region may be swapped out. Thereafter, when the memory region is swapped back in, an address translation table is updated, the valid bit is reset, and the quiesce indicator is reset so that further I/O operations to the memory region may occur.
摘要:
A method and apparatus for accessing a memory. Access rights for a memory operation are verified using a first data structure in response to receiving a request to perform the operation, wherein the request includes a virtual address for the operation. Responsive to access rights being verified for the memory operation, the virtual address translated into a real address using a second data structure.
摘要:
A method for routing System Area Network (SAN) packets to multiple partitions within a single end node is provided. A range of Local Identification addresses (LIDs) are assigned to a channel adapter port within the SAN. Lower order bits within the LID are then assigned to select the particular partition in the end node. The Local ID Mask Control (LMC) field is used to assign multiple LIDs to a single port, using those low order bits to then route the message to the appropriate partition in the end node.
摘要:
A method, system and program for controlling access to memory areas within a computer are provided. The invention comprises placing a first Bind Work Queue Element (WQE) at the head of a work queue, wherein the first Bind WQE defines parameters associated with a first Memory Window. A set of Work Requests is then placed on the work queue, behind the first Bind WQE wherein the work requests invoke operations that access the first Memory Window. A second Bind WQE is then placed on the work queue, behind the first set of Work Requests. This second Bind WQE defines parameters associated with a second Memory Window. A second set of Work Requests is placed on the work queue behind the second Bind WQE and invoke operations that access the second memory window. The Memory Windows can be associated with a common Memory Region and have different addresses and lengths or different access rights. In another embodiment, the first and second Memory Windows can be associated with different Memory Regions.
摘要:
A method, system and program for controlling access to computer memory are provided. The present invention comprises receiving a work request from a user, wherein the work request comprises an index portion and a protection portion. The index portion of the work request is used to locate an element in an address translation and protection table. The protection portion of the work request is then compared with a protection key in the table element, and access to memory is granted only if the protection portion and protection key match.
摘要:
A method, program and system for associating memory windows with memory regions in an infiniband data storage system are provided. The invention comprises registering a Memory Region, wherein the Memory Region is a set of virtually contiguous memory addresses defined by a virtual address and length. The system then establishes and maintains a Window Reference Count (WRC) for the Memory Region, which tracks the number of Memory Windows which are bound to the Memory Region. When the system binds a Memory Window to the Memory Region, the value of the WRC is incremented. When a Memory Window is unbound from the Memory Region, the value of the WRC is decremented. If no Memory Windows are bound to the Memory Region, the value of the WRC is zero. The Memory Region is not deregistered unless the value of the WRC equals zero.