Abstract:
Systems and methods for fault detection in large scale networks are provided. Probing instructions are generated by a probing controller associated with a network having a plurality of nodes. The probing instructions include a specified source node and a specified destination node. Each probing instruction is transmitted to a probing agent coupled to the specified source node and a data packet is transmitted from the probing agent to the specified destination node. A fault detection module is informed of probing instructions associated with failed transmissions, identifies a plurality of nodes having a likelihood of being in a network path associated with failed transmissions, and processes the plurality of nodes having a likelihood of being in the network paths associated with failed transmissions to identify of a set of likely failed nodes.
Abstract:
Systems and methods for locating network errors. The system includes a plurality of host nodes in a network of host nodes and intermediary nodes, and a database storing route data for each of a plurality of host node pairs. The system includes a controller configured to identify a subject intermediary node to investigate for network errors and select, using route data stored in the database, a set of target probe paths. Each target probe path includes a respective pair of host nodes separated by a network path including at least one target intermediary node, which is either the subject intermediary node or an intermediary node that is a next-hop neighbor of the subject intermediary node. The controller is configured to test each target probe path in the set of target probe paths and to determine, based on a result of the testing, an operational status of the subject intermediary node.