摘要:
Hardware component repair in a computing system while workload continues to execute on the computing system includes receiving an indication that an operational parameter of a first hardware resource of said computing system does not meet operational acceptability criteria; migrating workload of the computing system from said first hardware resource to a second hardware resource within the computing system; and halting operation of said first hardware resource for repair.
摘要:
Configuring NVMe devices for redundancy and scaling includes: identifying, by a first SSD (‘Solid State Drive’) driver executing on a first CPU (‘Central Processing Unit’), address space of a first SSD coupled to the first CPU by a first PCI (‘Peripheral Component Interconnect’) switch, the first PCI switch including one or more non-transparent bridges (‘NTBs’); partitioning, by the first SSD driver, the address space of the first SSD amongst the NTBs of the first PCI switch and the first CPU, where each NTB is configured to translate CPU memory addresses received from a CPU into a drive address in the address space partitioned to the NTB; and partitioning, by the first SSD driver, a plurality of namespaces of the first SSD amongst the first CPU and the NTBs.