Problem description: Availability Alert occurred in a monitoring task of a customer. After analyzing the report, we found that the direct cause is as shown in the figure below: the connection cannot be established with the server. Root cause analysis: The situation when the connection cannot be established with the server can be divided into the following three situations according to the timing of establishing the connection in Trace:
1.
In Trace, Probe has not left the public network;
2.
In Trace, Probe has left the public network and has not reached Host;
3.
In Trace, Probe has left the public network and arrived at Host. The first type is that Probe has not left the public network. This situation occurs very rarely. It is usually the Probe in the computer room, which is Tingyun's own Probe. If this happens, you need to contact the computer room and ask why there are restrictions on our Probe. This is what Probe Group needs to do. We just need to report the situation back to Probe Group.
The second type, Probe, has left the public network and has not reached Host. This situation is generally an operator problem. Troubleshooting ideas: Use the error code Probe to recur repeatedly, 3-4 times. If you reach an IP or IP segment every time, you can query this IP. As shown in the figure below: Through IP query, it is found that there is a problem with Shanghai Unicom. The customer needs to contact the operator to negotiate to solve the problem. The third type Probe is shown in the figure below: it has left the public network and arrived at Host. Instant Testing still needs to reappear. If the network returns to normal, it may be caused by network fluctuations; if it does not recover for a long time, it may be hijacked. Check through packet capture, as shown in the figure below, and ask the customer whether 120.240.95.33 is the customer's Host. If not, it is hijacked. If the customer needs to check the corresponding Host problem. Analysis ideas:
1.
First, judge the situation based on the trace display and analyze it based on the situation.
2.
Quickly use Instant Testing to reproduce and confirm whether Error always appears.