The customer started to drop Availability at around 5 a.m. on December 20, 2023, triggering Alert. It was found that the Error that affected the drop of Availability was mainly "unable to resolve the name or address of the server", and the reason was analyzed.
First check the error data to see if there are any commonalities, such as: city, operator, DNS, etc. Looking at the data, there is no commonality and it is relatively scattered.
2. Recurrence Use the Probe that reported the error to do the customer task again. The data is normal (reappears at 4:30 pm that day), indicating that the Error has been restored and only occurred for a period of time in the morning that day.
Because it did not reappear, and the monitored data at that time did not capture packets, and the Network information could not be seen, we could only check the DNS server Log first. Our client must have sent DNSRequest. Now make sure two things
1.
Find a few scattered points and check the corresponding DNS server Log to see if there is DNSRequest. If so, it basically means that the DNS server is not responding.
2.
Find a few scattered points and check the corresponding DNS server Log to see if there is DNSRequest. If not, it is the problem of the operator. The probability is very small because each operator has it.
task settings turns on Error packet capture, so that we can see the first "on-site" data. In the future, even if Instant Testing does not reappear, we can still have data support.