FAQ: Sawmill uses too much memory for builds/updates, and is slow to view


When I build or update my database with Sawmill, it uses a huge amount of memory. Then, when I view statistics, it's very slow. What can I do about that?

Short Answer

Decrease the complexity of the database.

Long Answer

The main portion of the database that uses memory are the "item lists". There is one list for each database field, and each list contains all the unique values for that field. If one of the fields in your database has many unique values, (millions) it can require a very large amount of memory to track. Simplifying the field can save memory.

To check which database field is the main culprit, look at the sizes of the files in the "items" subdirectory, in the database directory (in the Databases directory of the LogAnalysisInfo directory). For instance, if location directory is the largest, at 500 Meg, then you know that the "location" database field is responsible for the largest part of the memory usage.

When you've found the culprit, you need to reduce its memory usage. This is where you'll have to make compromises and cuts. The simplest solution is to delete the database field, and stop tracking and reporting on it. If that's not an option, you'll need to simplify the field in some way. The key point here is that you are trying to reduce the number of unique field values that Sawmill sees and tracks. The pool file, which is usually the largest one, contains a back-to-back list of all all field values that are used in the database; if you can reduce the number of possible field values used by Sawmill, you will reduce the size of the file.

If the field is a hierarchical (like a pathname, hostname, date/time, or URL), you can simplify it by tracking fewer levels, by adjusting the suppress_top and suppress_bottom values in the database.fields section of the profile .cfg file (in the profiles folder of the LogAnalysisInfo folder). For instance, the page field of web logs is tracked nine directories deep by default; you can simplify it by tracking only the top three levels directories. If your date/time field is set to track information to the level of minutes, you can change it back to tracking hours or days only. Usually, you will want to turn off bottom-level items checkbox for the field, since it's usually the bottom level that has all the detail.

Another possibility is to use a Log Filter to simplify the field. The default filter for web logs which replaces everything after ? with "(parameters)" is an example of this. By replacing all the various parameterized versions of a URL with a single version, this filter dramatically decreases the number of different page field values that Sawmill sees, therefore dramatically decreasing the memory usage of the "page" field. Similarly, if you have a very complex section of your directory structure, but you don't really need to know all the details, you can use a Log Filter to delete the details from your field, collapsing the entire structure into a few items.

A common source of high memory usage is a fully-tracked hostname/IP field. By default, Sawmill tracks only the first two levels of hostnames for web and proxy logs; i.e. it will tell you that a hit came from .sawmill.net, but not that it came from some.maching.sawmill.net. Because of the tremendous number of IP addresses that appear in large log files, this field can be a problem if it's set to track individual IPs (there's a checkmark that lets you do this when you create the profile). If this is happening, consider tracking only a few levels of the hostname hierarchy, instead of the the full IP address.

Of course, sometimes you really need the full detail you're tracking in a very large field. If you can't reduce the detail, and you can't reduce the amount of log data, then the only solution is to get enough memory and processing power to efficiently handle the data you're asking Sawmill to track.