FAQ: Session Computation


How does Sawmill compute session information, like total sessions, repeat visitors, paths through the site, entry pages, exit pages, time spent per page, etc.?

Short Answer

Sawmill uses the visitor id field to identify unique visitors. It decides that a new session has begun if a visitor has been idle for 30 minutes.

Long Answer

Sawmill computes session information by tracking the page, date/time, and visitor id (which is usually the originating hostname) for each page view in the log data. When a session view is requested, it processes all of these page views at the time of the request, ignoring those that are filtered out by filters on the page or date/time fields. All other hits are included-- filters on other fields are ignored in session information.

Sawmill groups the hits into initial sessions based on the visitor id-- it start by assuming that each visitor contributed one session. It sorts the hits by date so it has a click-by-click record of the movement of each visitor.

Then it splits the sessions, using the customizable session timeout interval (30 minutes by default). Since there is no real "log out" operation in HTTP, there is no way for Sawmill to know the real time that a user leaves the site; it can only guess by assuming that if they didn't click anything for 30 minutes, they must have left and come back. The split step, then, increases the number of sessions, resulting in possibly more than one session per visitor.

Finally, Sawmill discards sessions based on the Session Filters (which you can set in the Session Filters bar at the top of the statistics). The session filters can be set to discard all sessions except those from a particular visitor, or they can be set to discard all sessions except those which go through a particular page.

After that, Sawmill is ready to generate the statistics reports. The "Sessions Overview" report is generated by examining the sessions in various ways (for instance, the repeat visitors number is the number of visitors which have more than one session; i.e. those whose sessions were "split" by the timeout interval). The "enty pages" and "exit pages" report is generated by tabulating the first and last pages of every session. The "session pages" report is generated by finding every occurrence of each page in any session, computing how long it was from then until the next page in that session (exit pages are considered to have zero time spent per page), and tabulating the results for all pages to compute time per page and other statistics. The "paths (clickstreams)" report shows all the sessions in a single expandable view.