Using Database Filters


Note: Database filters are an advanced feature of Sawmill, and are usually used only by plug-in developers. Back up your profile before considering editing or creating database filters.

Database filters provide a way to scan, and potentially edit, the main table of the database, after log data has been parsed and imported. Their uses are many, but they are especially used to populate custom fields, especially in cases where the calculation must be done in a particular order.

Each database filter is a subnode of database.filters in the profile. Each database filter has a "conditions" subnode and an "expression" subnode. The conditions node lists conditions which must be met before the filter can be run; the expression subnode's value is a Salang expression which is executed for each row of the main table.

The conditions node

The conditions node lists all conditions which must be met before running the database filter. Each condition has a "type" subnode which specifies the type of the condition. Possible types are:

Type Description
sort
The main table must be sorted in a particular order when this filter is run.
variable
A particular variable, which will be used as a local variable in the filter, must be defined before running the filter.
lookback
Access to the main table during filtering must always allow access to a specified number of rows prior to the current row.
preceded_by_database_filter
A particular database filter must be run on the current line, before this filter is run.

These conditions are described in more detail below.

Condition: sort

A condition of type "sort" ensures that the main table is sorted into a particular order, for instance chronologically, before the filter is run. The condition node contains a "fields" node which contains one or more fields to sort on; sorting is done by the first listed field, then by the next listed field, etc. Each field node contains a "column_name" node which is either the name of the database field to sort on, or is a string of the form "variable:varname"; the latter format means that the variable varname should be used to get the sort field name (a condition node of type "variable" with variable name varname must exist; and its initial_value node will be used as the sort field name; see "Condition: variable" below). The "variable:" syntax is useful for database filters in generic snapons, where the field name is not known when the database filter is created, but is provided when attaching the snapon.

Condition: variable

A condition of type "variable" defines a local variable which can be used from the Salang filter expression. The condition node contains a "variable_name" node whose value is the variable name. The condition node also contains a "variable_type" node whose value is the variable type (a Salang type like "string" or "int"). The condition node also contains an "initial_value" node whose value is the initial value that should be assigned to the variable before filtering begins. The condition node may optionally also contains a "constant" node which, if true, specifies that this "variable" is actually a constant which will not be modified by the database filter expression; this allows some types of performance optimization when using the variable in the filter expression.

The condition node may optionally also contain a "parameter" node which, if true, specifies that this is a parameter whose initial value be supplied when the database filter is attached as part of a snapon, by providing it on the command line as a v.varname variable. For instance, if the condition has variable_name=session_user, and parameter=true, then the snap-on command line must include "-v.session_user hostname" (replacing "hostname" with the desired value of the session_user field).

During database update, the database filter will be run against the new data in the table. The value of the variables in this case will be set to the values they had at the end of the previous rebuild or update.

Condition: lookback

A condition of type "lookback" ensures that a certain number of consecutive rows of the main table will be accessible, for reading or writing, by the filter expression. The condition node contains a "rows" node whose value is the number of rows which may be accessed. For instance, a value of 1 means that in addition to the single row always available, one additional row may also need to be read or written by the database filter expression; i.e., this specifies that two consecutive rows should be accessible to the filter. Accesses by the filter expression to rows before the number specified by lookback may result in errors which terminate the filtering action; it is the responsibility of the database filter designer to ensure that the database filter accesses rows within the range specified.

One way of thinking of this is that the database filter can access the current row, and "lookback" rows prior to it. But it is actually a little more flexible than that, because the window of rows available can also move forward of the current row, as long as accesses are never done outside the window. For instance, if lookback is 10, and the current row is 15, the database filter could access row 20; that will cause the window to move forward to rows 10-20, and any rows in that range can then be accessed. Access to row 9 at this point, however, will cause an error--the window cannot move backward once it has advanced. So regardless of what the current row is, the real rule is: once you access row N in a filter, you may never after that access any row greater than N-lookback.

Condition: preceded_by_database_filter

A condition of type "preceded_by_database_filter " ensures that a particular database filter be run against the current row, before this database filter is run against the current row. The condition node contains a node "database_filter" whose value is the node name of the database filter which must proceed this one. This does not ensure that the specified database filter is completely done with all rows before this one runs; it only ensure that it has run against the current row, before this one is run against the current row. This is useful if this database filter uses the database field values set by the specified database filter, as its input in computing its own database fields.

The expression node

The "expression" node in a database filter contains the Salang expression which is executed for each row of the main table. In additional to having all standard Salang syntax available, table filter expressions can also use the following extended syntax:

  1. The variable number_of_rows is the number of rows in the table. During database build, this is the total number of rows we've just imported (the total number of rows in the main table); during database update it will also be the total number of rows in the main table (which includes not just the number of rows we've just imported, but also the number of rows from all previous imports).
  2. The expression thisrow{dbfieldname} returns the value of the specified database field in the current row in the table. If the database field uses itemnums, this returns the denormalized (string) value of the field. This can also be used as the target of an assignment, to change the value of a field; if the database field uses itemnums, this sets the field in the name table to the normalized value (integer itemnum) of the string value assigned to it.
  3. The variable first_rownum is the row number of the first row we're analyzing of the main table. During database build, this will be 0; during database update, this will be the row number of the first row we added in the update.
  4. The variable thisrownum is the row number of the current row in the main table. During database build, this will start at 0 and end at number_of_rows-1; during database build, this will start at first_rownum and end at number_of_rows-1.
  5. Database field names can be used as variable names to get or set the value of the fields in the current row, e.g. "page_views = 1" is equivalent to "thisrows{'page_views'}"
  6. The expression thistable[fMemos:rownum]{dbfieldname} returns the value of the specified database field in the specified row of the table. This can also be used as the target of an assignment (l-value), to change the value of a cell. It works like thisrow with itemnums.
  7. The variants thistableraw and thisrowraw work the same as thistable and thisrow, but do not do itemnum conversion, so they can be used to set or use itemnums directly. The values read and written will not be automatically normalized on read or denormalized on write; they are always itemnums (denormalized values).

Example

Here is a simple example of a database filter, which populates a field called double_hits, with twice the value of a field called hits:

  double_hits = {
    expression = `double_hits = 2 * hits`
  } # double_hits

This simple filter has no condition, which means it defines no variables and causes no special sorting of the main table; it will simply be run against the main table in whatever order it happens to be sorted already. The database filter iterates through all rows of the main table, gets the value of "hits" for each row, multiplies it by 2, and sets the value of the "double_hits" field in that row to the doubled value.

For more advanced examples, including examples using variables and sort conditions, see the "snapons" directory in the LogAnalysisInfo directory. Snapons often add database filters to a profile, and the database filter will be included in the CFG file for the snapon. The sessions.cfg snapon is particularly sophisticated: it implements a web server style session analysis by causing the main table to be sorted on session events (e.g., all page_views=1 events grouped together, with page_views=0 events ignored in a separate group), then hostname, then date time; and the it iterates through the sorted table (now conveniently grouped by hostname and sorted chronologically in each group), computing and setting the various session fields including session duration, entry pages, exit pages, session IDs, etc.