Fault tolerant distributed tasks using distributed file systems

    公开(公告)号:US10379956B2

    公开(公告)日:2019-08-13

    申请号:US15595732

    申请日:2017-05-15

    Abstract: Data files in a distributed system sometimes becomes unavailable. A method for fault tolerance without data loss in a distributed file system includes allocating data nodes of the distributed file system among a plurality of compute groups, replicating a data file among a subset of the plurality of the compute groups such that the data file is located in at least two compute zones, wherein the first compute zone is isolated from the second compute zone, monitoring the accessibility of the data files, and causing a distributed task requiring data in the data file to be executed by a compute instance in the subset of the plurality of the compute groups. Upon detecting a failure in the accessibility of a data node with the data file, the task management node may redistribute the distributed task among other compute instances with access to any replica of the data file.

    Fault tolerance in a distributed file system

    公开(公告)号:US10133646B1

    公开(公告)日:2018-11-20

    申请号:US15468708

    申请日:2017-03-24

    Abstract: A method for providing fault tolerance in a distributed file system of a service provider may include launching at least one data storage node on at least a first virtual machine instance (VMI) running on one or more servers of the service provider and storing file data. At least one data management node may be launched on at least a second VMI running on the one or more servers of the service provider. The at least second VMI may be associated with a dedicated IP address and the at least one data management node may store metadata information associated with the file data in a network storage attached to the at least second VMI. Upon detecting a failure of the at least second VMI, the at least one data management node may be re-launched on at least a third VMI running on the one or more servers.

Patent Agency Ranking