FAQ: Days Are Missing from the Log Data


When I look at my statistics, I see that some days are missing. I know I had traffic on those days. Why aren't they shown?

Short Answer

Your ISP may be regularly deleting or rotating your log data. Ask them to leave all your log data, or rotate it over a longer interval. It's also possible that your log data does not contain those days for another reason.

Long Answer

To save disk space, many ISPs delete, or "rotate" (rename and/or compress) the server log data regularly. For instance, instead of letting the log file grow forever, they may rename it every day, start a new one, and compress the old one; then, every week, they may delete the logs older than seven days. In other, more dramatic cases, they may simply delete the log file every month or week, and restart a new one.

Though this does save disk space on the server, it presents serious problems for log analysis. When you rebuild the database with Sawmill, it processes all the existing log data, and creates a new database from it. If some of the old log data has been deleted, that data will no longer be available in the statistics. So if the ISP deletes the logs every month, and you rebuild your database, your statistics will go back one month at the most.

Similarly, when you update the database, Sawmill adds any new data in the existing log data to the database. So if the ISP deletes log files every month, and you only update your database every month on the 15th, then all the data from the 15th to the end of each month will be missing, because it was not added through an update, and it was deleted on the 1st of the month.

The best solution is to convince your ISP to keep all of your log data, and never delete any of it. If you can do that, then there will be no problem-- you'll always be able to rebuild or update your database and get all of the statistics. Since this will require more of your ISPs disk space, however, they may not be willing to do this, especially if you have a very large site, or they may charge extra for the service. Of course, if you own and manage your own server, you can do this yourself.

The second best solution, if you can't convince the ISP to keep all log data, is to store your back log files on your own system. If your ISP rotates the data through several logs before deleting the oldest one, this is easy-- just download the logs you don't have regularly (you may be able to automate this using an FTP client). If they only keep one copy, and delete it and restart it regularly, then you'll need to download that file as close to the reset time as possible, to get as much data as possible before it is deleted. This is not a reasonable way for ISPs to rotate logs, and you should try to convince them to rotate through several files before deleting the oldest one, but some of them do it this way anyway. You'll never get all of your log data if they use this technique-- the very last entries before deletion will always be lost-- but if you time it right you can get pretty close.

Once you have the logs on your system, you can analyze that at your leisure, without worrying about them being deleted. In this situation, you'll probably want to run Sawmill on the system where you keep the back logs.

If your log rotation is not the issue, then it may be that your log data does not contain the data for another reason. Maybe the server was down for a period, or the log data was lost in a disk outage, or it was corrupted. Look at the log data yourself, using a text editor, to make sure that it really does contain the days that you expected it to contain. If the data isn't in your logs, Sawmill cannot report statistics on it.