Abstract:
Techniques are disclosed for validating the resiliency of a networked application made available using a distributed computing infrastructure. In one embodiment, a latency monitoring application observes each active application component and at specified or unspecified intervals, selects one and introduces latency or error messages in one or more messages emanating from the selected active application component. The latency monitoring application then measures the effect of the latency or error messages on other active application components that are dependent on the affected active application component. By observing the effects of the failed server on the rest of the network application, a provider can ensure that each component can tolerate any unexpected latency or error conditions with the distributed computing infrastructure.
Abstract:
An online distributed computer system with methodologies for distributed trace aggregation and targeting distributed tracing. In one aspect, the disclosed distributed tracing technologies improve on existing distributed tracing technologies by providing to application developers and site operations personnel a more holistic and comprehensive insight into the behavior of the online distributed computer system in the form of computed span metric aggregates displayed in a graphical user interface thereby making it easier for such personnel to diagnose problems in the system and to support and maintain the system. In another aspect, the disclosed distributed tracing technologies improve on existing distributed tracing technologies by facilitating targeted tracing of initiator requests.