Inter-host communication without data copy in disaggregated systems

    公开(公告)号:US10915370B2

    公开(公告)日:2021-02-09

    申请号:US16204550

    申请日:2018-11-29

    摘要: Direct inter-processor communication is enabled with respect to data in a memory location without having to switch specific circuits through a switching element (e.g., an optical switch). Rather, in this approach a memory pool is augmented to include a dedicated portion that serves as a disaggregated memory common space for communicating processors. The approach obviates the requirement of switching of physical memory modules through the optical switch to enable the processor-to-processor communication. Rather, processors (communicating with another) have an overlapping ability to access the same memory module in the pool; thus, there is no longer a need to change physical optical switch circuits to facilitate the inter-processor communication. The disaggregated memory common space is shared among the processors, which can access the common space for reads and writes, although particular locations in the memory common space for reads and writes are different.

    Optimizing simultaneous startup or modification of inter-dependent machines with specified priorities

    公开(公告)号:US10268512B2

    公开(公告)日:2019-04-23

    申请号:US15629836

    申请日:2017-06-22

    摘要: Identify individual machines of a multi-machine computing system. Construct a graph of dependencies among the machines. Obtain estimated total administration times and administration priorities for each of the machines. Identify availability of administration resources to assist in administration of one or more of the machines. Select a first set of machines for administration in response to the graph, administration priorities, estimated total administration times, and availability of the first set of administration resources, and administer the first set of machines in parallel using the first set of administration resources. Update the graph in response to administration of the first set of machines. Select a subsequent set of machines for administration in response to the updated graph, administration priorities, estimated total administration times, and availability of a subsequent set of administration resources. Administer the subsequent set of machines in parallel using the subsequent set of administration resources.

    Automated fault and recovery system
    6.
    发明授权
    Automated fault and recovery system 有权
    自动故障恢复系统

    公开(公告)号:US09058265B2

    公开(公告)日:2015-06-16

    申请号:US13710710

    申请日:2012-12-11

    IPC分类号: G06F11/00 G06F11/07

    摘要: A mechanism is provided for handling incidents occurring in a managed environment. An incident is detected in a resource in the managed environment. A set of incident handling actions are identified based on incident handling rules for an incident type of the incident. From the set of incident handling actions, one incident handling action is identified to be executed based on a set of impact indicators associated with the set of incident handling rules. The identified incident handling action is then executed to address the failure of the resource.

    摘要翻译: 提供了一种用于处理在受管环境中发生的事件的机制。 在受管环境中的资源中检测到事件。 根据事件事件类型的事件处理规则确定一组事件处理动作。 根据一组事件处理动作,根据与事件处理规则集相关联的一组影响指标,确定一个事件处理动作被执行。 然后执行所识别的事件处理动作以解决资源的故障。

    Rolling upgrades in disaggregated systems

    公开(公告)号:US10970061B2

    公开(公告)日:2021-04-06

    申请号:US16660676

    申请日:2019-10-22

    IPC分类号: G06F8/65

    摘要: Embodiments for performing rolling software upgrades in a disaggregated computing environment. A rolling upgrade manager is provided for upgrading one or more disaggregated servers. A designated memory area is used for storing an updated software component, and a disaggregated server is switched to the designated memory area from a currently assigned memory area when performing the software upgrade. A process state and program data is maintained in the currently assigned memory area while maintaining the updated software component in the designated memory area such that the process state and program data are read from the currently assigned memory area and the updated software component is read from the designated memory area during currently executing operations of the disaggregated server.