Log Fields


Log fields are containers which hold particular values in the log data, or which act as variables to hold other values. In general, each log entry contains multiple fields, and Sawmill extracts those field values from the log data and populates it into the log fields. The log fields are then processed by log filters, and if the entry is accepted by the filter, they are then copied to database fields to be included into the database. The database fields are then used to generate reports.

For instance, if a log file contains three comma-separated fields per line: date, time, and page, then the log fields would be date, time, and page and any derived fields, like hour_of_day; see below. Log fields can have any names, and the log fields in a profile depends on the log format.

A log field may be either an actual field or a derived field, which is present in each log entry of the file. For instance, the "page" field or the "hostname" fields are actual fields, and a derived field which is not present in log entries, but is derived from the entries which are present, and from other information. Derived fields include fields like "domain description," which is a textual description of the host domain (e.g. "France" for .fr), and is derived from the hostname field, or the "day of week" field, which is derived from the date/time field, or the "operating system" field, which is derived from the user-agent field. Derived fields are present when the fields they get their value from are present; they are created automatically when their source fields are created. See below for more information about specific derived fields.

For actual fields which are not derived, you can specify a number of parameters. Here is an example node describing a "page" field of a web log.

      page = {
        type = "page"
        index = "5"
        subindex = "2"
        hierarchy_dividers = "/"
        leading_divider = true
        left_to_right = true
        label = "$lang_stats.field_labels.page"
      } # page

The possible parameters (subnodes) are as follows:

  1. name: The name of this field. This is the name of the node which describes the field; i.e., it is the part before the first equal sign (=). In the example above, this is page. This is the internal name, which is used in command lines and advanced expressions.

  2. label: The label of this field. This is used to refer to the field in the web interface, in reports, the Log Filter Editor, and more.

  3. type: The type of the log field. The type describes the format of the field, and sometimes also describes the purpose of the field. Allowable field types are:

    • page: The "page hit" field of a web log, or any field in a /-divided pathname format; e.g. /mypages/dir/file.html. This also acts like a "hierarchical" field.

    • host: The browsing hostname, or any field in hostname format; e.g. my.host.com. This also acts like a "hierarchical" field. The domain description, country, region, city, location, and organization fields are derived from this field.

    • url: Any field in URL format (e.g. http://hostname/page.html). The search phrase field is derived from this field.

    • date_time: A combined date/time field. The format of this field depends on the setting of Date format and Time format. If this field is not present in the log, but date and time are, this field will be derived from those fields. The day of week, hour of day, week of year, and day of year fields are derived from this field.

    • date: A year, month, and day, date field. The format of this field depends on the setting of Date format.

    • time: A time of day; hour, minute, and second. The format of this field depends on the setting of Time format.

    • agent: An agent, or browser field. The browser OS, browser type, and browser version fields are derived from this field. This also acts like a "flat" field.

    • size: The size field as in bytes transferred. This field will be used to compute bandwidth information. This also acts like an "integer" field. The size range field is derived from this field.

    • integer: Any field whose value is an integer (e.g. 67). This also acts like a "flat" field.

    • response: The "server response" field, containing the numeric HTTP server response code (e.g. 200, 404).

    • hierarchical: Any field which is multi-level hierarchical. The hierarchy divider and other parameters can be specified below. See Hierarchies and Fields.

    • flat: Any field which is hierarchically flat; all fields are directly below the root of the hierarchy. See Hierarchies and Fields.

  4. index: This specifies the index of this log field in the log data. For instance, if this is the first log field in the entry, this should be 1. If this is the fifth log field in the line, it should be 5. This can be left 0 if the log field is being filled in by a parsing regular expression or by parsing filters.

  5. subindex: This specifies the subindex of this log field in the log data. This can usually be left at 0. A subindex is required only when the log field is contained inside another quoted field. In that case, the position of the quoted field is specified using the index, and the subindex indicates the position within the quoted field, by a space-separated subfield. This can be left 0 if the log field is being filled in by a parsing regular expression or by parsing filters.

  6. hierarchy_dividers: This specifies the character(s) which divide hierarchy levels in this field, if this field is hierarchical. Up to three characters may be specified. For instance, in a standard "page" field (e.g. /one/sample/page.html), the divider would be / or /? or /?& . Include the ? if you want the page field to be split on the URL parameters divider; include the & if you want it split between parameters. See Hierarchies and Fields.

  7. left_to_right: This specifies whether the hierarchy is left-to-right, i.e. with the higher hierarchy levels at the left, like /one/sample/page.html, or right-to-left, i.e. with the higher hierarchy level at the right, like some.hostname.com. See Hierarchies and Fields. The value is true or false.

  8. leading_divider: This specifies whether the field has a hierarchy divider at its highest end, e.g. /one/sample/page.html, which starts with /, or not. e.g. one/sample/page.html or some.hostname.com, which have hierarchy dividers only inside the field. The value is true or false.

  9. case_sensitive: This controls whether the field is case-sensitive. When this is false, Sawmill treats items as the same if they differ only in the case (uppercase/lowercase); for instance, index.html and Index.html are treated as the same page. When this option is true, the case of the item as it appears in statistics is determined by the case of the first item Sawmill sees while processing the log data. So if the first hit is on index.html and the second is on Index.html, it will appear in the statistics as two hits on index.html. So in the example above, the statistics would list one hit on index.html, and one hit on Index.html. The value is true or false.

Derived Fields

Derived fields are computed log fields from real log fields that are parsed through an algorithm. The logs that are parsed can then be used in database fields. Possible derived fields are:

Editing Log Fields

To edit the log fields, open the profile .cfg file, within LogAnalysisInfo/profiles, using a text editor, and search for "log = {"; then search forward from there for "fields = {". Each log field is a separate bracketed group under the fields group of the log group, and each log field lists the parameters described above. Edit the field, and save the file, and your changes will take effect on the next database rebuild.