Databases


Sawmill uses a database on the disk to store information about log data. The database contains a compact version of the log data in the "main table", and a series of secondary tables which provide hierarchy information and improve performance of some queries. Every time a new log entry is read, the information contained in that entry is added to the database. Every time a statistics page is generated, the information needed is read from the database.

Reports can query data from the database based on multiple filters. For instance, it is possible in a virus log to filter to show only the source IPs for a particular virus, and for a web log it's possible to see the pages hit by particular visitor. In general any combination of filters can be used; if it possible to create complex and/or/not expressions to zoom in on any part of the dataset.

For large datasets, it can be slow to query data directly from the main table. Query performance for some types of tables can be improved using cross-reference tables, which "roll up" data for certain fields into smaller, fast-access tables. For instance, for a web log, you can create a cross-reference table containing page, hit, and page view information; the table will pre-compute the number of hits and page views for each page, so the standard Pages report can be generated very quickly. See Cross-Referencing and Simultaneous Filters for more information.

The Database directory option specifies the location of the database on disk; if the option is blank, Sawmill stores the database in the Databases directory, in the LogAnalysisInfo directory, using the name of the profile as the name of the database directory.

New log data can be added to the database at any time. This allows a database to be quickly and incrementally updated, for instance, every day with that day's new log entries. This can be done from the web browser interface by using the Update Database option in The Config Page. A command line (see The Command Line) which would accomplish the same thing is

 sawmill -p config-file -a ud

If your log files are very large, or if your database is extensively cross-referenced, building a database can take a long time, and use a lot of memory and disk space. See Memory, Disk, and Time Usage for information on limiting your memory and disk usage, and increasing the database build speed.

A number of advanced options exist to fine-tune database performance. To get the most out of the database feature, you may want to adjust the values of the database parameters.