Three tasks, from 19:00 on the 25th to 0:00 on the 26th, there was a problem with the customer's origin site. Two of the tasks threw Error "HTTP/1.1 404 Not Found (page)" within the corresponding time period. The task data was accessed more than 5,000 times. The other task data was only more than 100, and no error 404Error was reported.
First, check the delivery and warehousing of tasks with a small amount of data. It is found that the warehousing failed more than 9,000 times (delivery and resend), and the warehousing failed. After checking the data in the background, we found that the reason for the failure to enter the database was: Error Type>0; this field is a field used to determine whether the data is noise data in the streaming task. When Error Type>0, the data will be marked as noise and will be filtered directly in the dispatch center and will not be displayed in the report.The phenomenon of three task data raises a new question: out of three tasks, why does only one task have less data? View by status code The response header of the task with data is found to be 404It is found that the response header of the task without data is 200 OK
To sum up, the data difference is caused by the different status returned. In the streaming media task, when the status code returns 200 or 200 OK, but the stream playback fails, it is a failure for streaming media and has little reference significance, so it is filtered out.