LOSSLESS FAILOVER FOR DATA RECOVERY
    2.
    发明公开

    公开(公告)号:US20240354206A1

    公开(公告)日:2024-10-24

    申请号:US18759557

    申请日:2024-06-28

    申请人: Rubrik, Inc.

    摘要: Some users of a data management system (DMS) may use multiple computing environments to replicate and store virtual machines (VM)s, such as for backup and recovery purposes. For example, different replication environments may include one or more private data centers, one or more cloud environments or any combination thereof. A user may schedule a failover procedure for an application. A DMS may perform a failover procedure that reduces downtime and eliminates data loss. The DMS may capture and replicate a snapshot of a VM running on a source computing environment to a target computing environment, power down the VM on the source computing environment, capture and replicate a second snapshot of the VM to the target computing environment, and power on the VM at the target computing environment. As the additional snapshot includes a relatively small amount of data replication at the target computing environment may proceed quickly, reducing downtime.

    Fault tolerant distributed computing system based on dynamic reconfiguration

    公开(公告)号:US12124335B1

    公开(公告)日:2024-10-22

    申请号:US18350358

    申请日:2023-07-11

    IPC分类号: G06F11/14 G06F11/16 G06F11/18

    摘要: A fault tolerant distributed computing system includes a communication link and a plurality of nodes in electronic communication with one another by the communication link. Each node executes at least one node-specific application, includes a standby database that stores a standby copy corresponding to one of the node-specific applications executed by one of the remaining nodes that are part of the distributed computing system, and includes a spare computational capacity sufficient to execute at least one standby copy of one of the node-specific applications stored in the standby database. In response to determining a specific node is non-operational, the remaining nodes execute all the standby copies of the one or more node-specific applications that were previously executed by the specific node that is now non-operational.

    DETECTION AND MITIGATION OF MALFUNCTIONING COMPONENTS IN A CLUSTER COMPUTING ENVIRONMENT

    公开(公告)号:US20240303146A1

    公开(公告)日:2024-09-12

    申请号:US18118195

    申请日:2023-03-07

    IPC分类号: G06F11/07 G06F11/16

    CPC分类号: G06F11/0772 G06F11/1658

    摘要: Techniques are provided for detection and mitigation of malfunctioning components in a cluster computing environment. One method comprises obtaining, by a virtual infrastructure monitor, from a cluster monitor, an indication of a malfunctioning component in a cluster computing environment; selecting a virtual infrastructure server type for a replacement virtual infrastructure server based on a type of the malfunctioning component; creating a replacement virtual infrastructure server based on the selected virtual infrastructure server type and properties of a virtual infrastructure server associated with the malfunctioning component; applying settings to the replacement virtual infrastructure server according to rules for the replacement virtual infrastructure server; deploying a replacement component on the replacement virtual infrastructure server; and providing a notification to the cluster monitor of the replacement component and credentials of the replacement component. The cluster monitor may add the replacement component to the cluster computing environment responsive to the notification.

    Memory error detection
    7.
    发明公开

    公开(公告)号:US20240296088A1

    公开(公告)日:2024-09-05

    申请号:US18433897

    申请日:2024-02-06

    申请人: Rambus Inc.

    摘要: Systems and methods are provided for detecting and correcting address errors in a memory system. In the memory system, a memory device generates an error-detection code based on an address transmitted via an address bus and transmits the error-detection code to a memory controller. The memory controller transmits an error indication to the memory device in response to the error-detection code. The error indication causes the memory device to remove the received address and prevent a memory operation

    Hardware control path redundancy for functional safety of peripherals

    公开(公告)号:US12072776B2

    公开(公告)日:2024-08-27

    申请号:US18166787

    申请日:2023-02-09

    摘要: A circuit includes a primary register region and a primary shadow register; a secondary register region and a secondary shadow register; and a safety controller having multiple states. The safety controller transitions to a first write state when a first write signal to write a first value to the primary register region is detected, and copies the first value written to the primary register region to the primary shadow register; transitions to a second write state when a second write signal to write a second value to the secondary register region is detected within a set amount of time of detection of the first write signal, and in the second write state, copies the second value written to the secondary register region to the secondary shadow register; transitions to a compare state to receive a comparison signal indicating whether the first value is the same as the second value; and transitions to an update state when the first value is the same as the second value.

    PROGRAMMING FAILURE HANDLING DURING DATA FOLDING

    公开(公告)号:US20240281346A1

    公开(公告)日:2024-08-22

    申请号:US18434461

    申请日:2024-02-06

    IPC分类号: G06F11/16 G06F11/07 G06F11/10

    摘要: Methods, systems, and devices for programming failure handling during data folding are described. A memory system may support a non-blocking exception handling process for handling program failures that occur during folding. For example, if a program failure occurs at a given page, the memory system may mark the failed page as storing uncorrectable data (e.g., associated with an uncorrectable error correction code (UECC) error) rather than as being associated with the program failure. Based on the marking, the memory system may continue the folding operation, allowing the data to be moved to another page of a physical destination block. After the folding operation is complete, the memory system may replace a failed physical destination block that includes the failed page with a spare block and retire the failed physical destination block.