FAQ: Years are wrong in the statistics


The statistics show the wrong years -- when I analyze data from previous years, it appears as this year, or data from this year appears in last year. Why?

Short Answer

Your log format does not include year information, so Sawmill has to guess the year. Use a different log format if possible (one which includes year information). See the long answer for a way of manually setting the year for blocks of log data.

Long Answer

Most log formats include the year as part of the date on every line, but a few (in particular, Unix Syslog format) include only month and day. In this situation, Sawmill has no way of knowing which year a particular event occurred in, so it has to guess. Recent versions of Sawmill will always guess that the event occurred in the current year; previous versions may have a particular year hard-coded in the default_log_date_year option in the profile, and will put all events in that year.

The best solution, if possible, is to use a different log format--use a log format that has year information. Then Sawmill will always categorize events in the correct year.

If that's not an option, then you will need to help Sawmill to know which data belongs in which year. There are several options, but the easiest one, if you are using Unix Syslog format, is to rename your log files so they end in yyyy.log, where yyyy is the year the log data is from. If some logs span multiple years, you will need to split those logs into files which do not cross year boundaries. For instance, if you have mail.log which contains data from 2004, 2005, and 2006, you can split it into three files, mail_2004.log, mail_2005.log, and mail_2006.log. The Unix Syslog plug-in automatically recognizes filenames which end with yyyy.log, and uses that value as the year when no year is available in the log data.

Another option, also for logs written by Unix Syslog, is available if the message part of each log line contains a full date, including year. For instance, some logging devices include "date=2006-02-01" in the log data, indicating the date of the event. In this case, even though the syslog format may not have the year, the device plug-in can extract the year from the message. This is usually a simple modification of the plug-in, but not all plug-ins have been modified to support this yet. If your log data contains year information in the message, but the reports show data from the log year, please contact support@sawmill.net and we will add extraction of years from the message of your format (include a small sample of log data, as a compressed attachment).

Another option is to put the data in directories by year; e.g. put all your 2005 data in a directory called /logs/2005, and all your 2006 log data in a directory called /logs/2006, and then process the data in stages using the following command lines:

sawmill -p profilename -a bd log.source.0.pathname /logs/2005 log.processing.default_log_date_year 2005
sawmill -p profilename -a ud log.source.0.pathname log.processing.default_log_date_year 2006

The first command creates a database using all the data from 2005, using 2005 as the date. The second command processes all the data from 2005, adding it to the existing database, using 2006 as the date. The final result is that you have a database which has 2005 data in 2005 and 2006 data in 2006. From then on, you can update your database normally, and the new log data (from the most recent day) will be correctly categorized in the current year. If new data continues to be added in the wrong year, make sure that the default_log_date_year option is set to thisyear in your profile .cfg file (in LogAnalysisInfo/profiles), and in LogAnalysisInfo/default_profile.cfg.