The customer adjusted the background Application on March 17, 2023, with a shutdown time of 23:29:19 and a restart time of 23:33:38. During this period, the corresponding task should report an error and then trigger Alert, but Tingyun did not trigger the response action. Can you check the reason?
When this problem occurs, troubleshooting ideas: (1) Check whether there is Alert Log in the corresponding time period, (2) Check whether the task data is normal, (3) Check the corresponding time period data and compare it with the Alert condition to see whether the conditions are met. (1) Check whether there is Alert Log in the corresponding time period. The customer's operation affects two tasks. Check whether there is Alert Log in the corresponding time period. After checking, it is found that there is no Log, indicating that the Alert condition has not been met. (2) Check whether the task data is normal The customer task frequency is 1 minute. Probe Group is two Probe. During the shutdown period of Application, each task should have about 8 Probe, but in fact there are only two Probe. The background view is caused by insufficient SLA. (3) Check the corresponding time period data and compare it with the Alert condition to see if the condition is met. From the task data, the OA task performed two tasks, one page Error, 50% of Availability, which reached the threshold. However, combined with the Alert condition to trigger the Availability threshold, two Error points need to be met, so the task did not trigger Alert. The portal task is within the monitoring period, and there is no error in the task data, Availability is 100%, so Alert is not triggered.
There is another problem this time, that is, when the customer closed the Application time period, he found that one task Availability was 50%, and the other one did not report an error.reason: (1) Fifty percent of Availability’s tasks are that the client’s server shutdown time is accurate to the second. The correct point is before the server is shut down. The error is reported within the time period. (2) 100% task: After communicating with the technology, we learned that the customer’s corresponding Application is more than Probe. It takes time to shut down the server. It can be accessed normally during shutdown and no error is reported. After shutdown, there is no access due to insufficient ProbeSLA, so Availability is 100%.There are two directions to solve this phenomenon (1) Add Probe, such as Probe settings to five Probe, and fill in 3 for SLA. This can fully adjust Probe and avoid insufficient SLA. (2) Reduce the task frequency. The frequency is too high and the number of tasks is too many, causing Probe to be unable to complete the task. After reducing it, Probe can be fully adjusted.