Abstract:
One embodiment of the present invention provides a fault-management system. During operation, the system identifies a failure at a remote location associated with a communication service. The system then determines a local port used for the communication service, and suspends the local port, thereby allowing the failure to be detected by a device coupled to the local port.
Abstract:
A system and computer program product for performing failover in a redundancy group, where the redundancy group comprises a plurality of routers including an active router and a standby router, the failover being characterized by zero black hole or significantly reduced black hole conditions versus a conventional failover system. The system comprises a processing unit connected to the memory and adapted to execute the plurality of instructions, which cause an information appliance to: receive an incoming message at a switch; send a request of identification to the plurality of routers to identify a current active router, where the current active router represents a virtual router of the redundancy group; and in response to receiving a reply containing an identification from the current active router within a predetermined time, forward the incoming message to the current active router.
Abstract:
This backup SIP server (BSS) comprises: means (LMM) for detecting whether an Internet protocol link is not working, and enabling the use of a backup SIP signaling link to the main site via a SIP gateway and a public telephone network when the Internet protocol link is not working; means for transferring SIP signaling information on this backup link; means for, when receiving a registration request from a terminal of the remote site while the Internet protocol link is not working, registering this terminal locally and forwarding the registration request to the main site via the backup link; means (PQM) for storing policies defining what services, supplied by the main SIP server, are compatible with said backup SIP signaling link, and for altering the content of at least one field in each SIP signaling message addressed to the main SIP server before transferring this SIP signaling message on the backup link, this content being altered according to said policies.
Abstract:
A method is provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted.
Abstract:
An example involves assigning a first logical circuit identifier to a logical failover circuit prior to a failure of a dedicated logical circuit. The dedicated logical circuit includes first variable communication paths to route data through a first local access and transport area (LATA), second variable communication paths to route the data through a second LATA, and fixed communication paths to route the data between the first LATA, the second LATA, and an inter-exchange carrier. The failure of the dedicated logical circuit is identified based on status information pertinent to the dedicated logical circuit. The logical failover circuit includes an alternate communication path for communicating the data. When the first logical circuit identifier does not match the second logical circuit identifier, the second logical circuit identifier is renamed to match the first logical circuit identifier. The data is rerouted to the logical failover circuit without manual intervention.
Abstract:
An example method involves generating, with a network management module, a data structure to store current reroute statistics based on rerouting of data from a logical circuit that has failed to a logical failover circuit in a network. The current reroute statistics include trap data corresponding to the logical circuit. The trap data includes a committed burst size. The logical circuit is identified by a first logical circuit identifier. The logical failover circuit is identified by a second logical circuit identifier. The first and second logical circuit identifiers are renamed until the logical circuit has been restored from failure. The table is updated with the network management module to store updated reroute statistics. The updated reroute statistics include updated trap data corresponding to the logical circuit. The updated reroute statistics are based on a change in status of the logical circuit resulting from the committed burst size having been exceeded.
Abstract:
The WSAN simultaneous failures recovery method ranks each node based on the number of hops to a pre-designated root node in the network. The method identifies some nodes as cluster heads based on the number of their children in the recovery tree. The method assigns a recovery weight and a nearby cluster node to each node. Nearby cluster nodes serve as gateways to other nodes that belong to that cluster. The recovery weight is used to decide which node is better to move in order to achieve lower recovery cost. The recovery method uses the same on-going set of actors to restore connectivity. Simulation results have demonstrated that the recovery method can achieve low recovery cost per failed node in small and large networks. The results have also shown that clustering leads to lower recovery cost if the sub-network needs to re-establish links with the rest of the network.
Abstract:
A fault tolerance method and system for VMs on a cluster identifies a client state for each client session for those applications. The method replicates the client session onto a primary and a backup VM, and uses a network controller and orchestrator to direct network traffic to the primary VM and to periodically replicate the state onto the backup VM. In case of a VM failure, the method reroutes network traffic of states for which the failed VM serves as a primary to the corresponding backup, and replicates states without a backup after the failure onto another VM to create new backups. The method may be used as part of a method or system implementing the split/merge paradigm.
Abstract:
A memory unit including a first data transferring/receiving unit suitable for transferring/receiving data through a first data bus for communication with a host, a second data transferring/receiving unit suitable for transferring/receiving data through a second data bus for a data backup, and a control unit suitable for controlling the first data transferring/receiving unit and the second data transferring/receiving unit to be activated or inactivated according to whether a power failure occurs in the host.
Abstract:
Systems and techniques are described for a path maximum transmission unit (MTU) discovery method that allows the sender of IP packets to discover the MTU of packets that it is sending over a conduit to a given destination. The MTU is the largest packet that can be sent through the network along a path without requiring fragmentation. The path MTU discovery method actively probes each sending path of each conduit with fragmentation enabled to determine a current MTU and accordingly increase or decrease the conduit MTU. The path MTU discovery process is resilient to errors and supports retransmission if packets are lost in the discovery process. The path MTU discovery process is dynamically adjusted at a periodic rate to adjust to varying network conditions.