FAQ: Definitions of Numerical Fields
In web server analyses, what are "hits," "page views," "bandwidth" or "bytes," "visitors," or "sessions"? In media analyses, what are "stream duration," "play duration," "pause duration," "session duration," "events," "streams," or "concurrent connection," or "successful accesses"?
Short Answer
Hits are accesses to the server; page views are accesses to HTML pages; visitors are unique visitors to the site, and sessions are visits to the site. Play duration is the most useful measure of time actually spent playing; pause duration is time spent paused; stream and session duration are the time spent connected; events is the total number of log lines; stream is the unique number of streams accessed; successful accesses are the number of non-error streaming events.
Long Answer
Web Server Numerical Fields
Sawmill can count web log traffic in several ways. Each way is counted independently of the others, and each has its own advantages in analyzing your traffic. The different types are:
-
Hits. Hits are accepted log entries. So if there are 5000 entries in your log file, and there are no log filters, and all the entries are valid (i.e. none of them have corrupt dates), then Sawmill will report 5000 hits for the file. If there are log filters that reject certain log entries, then those will not appear as hits. Log entries that are accepted by the log filters will count toward the hits totals. Because there are no default filters that reject, you will generally have nearly as many reported hits as you have log entries. You can view and edit the log filters by Opening your profile from the Administrative Menu, clicking Profile Options, and then clicking the Log Filters tab. See also Using Log Filters.
-
Page views. Page views correspond to hits on pages. For instance, if you're analyzing a web log, and a hit on /index.html is followed by 100 hits on 100 images, style sheets, and JavaScript files, that appear in that page, then it will count as a single page view -- the secondary files do not add to the total. This is implemented in the log filters -- page views are defined as log entries that are accepted by the log filters, and that have a page_view value set to 1 by the log filters. Log entries that are accepted by the filters, but have page_view of 0 set by the filters do not contribute to the page views total. Therefore, you have complete control over which files are "real" page views and which are not -- if Sawmill's default filters do not capture your preferred definition of page views, you can edit them until they do. By default, page views are all hits that are not GIF, JPEG, PNG, CCS, JS, and a few others. See Hits, above, for more information on log filters.
-
Visitors. Visitors correspond roughly to the total number of people who visited the site. If a single person visits the site and looks at 100 pages, that will count as 100 page views, but only one visitor. By default, Sawmill defines visitors to be "unique hosts" -- a hit is assumed to come from a different visitor if it comes from a different hostname. This can be inaccurate due to the effects of web caches and proxies. Some servers can track visitors using cookies, and if your web logs contain this information, Sawmill can use it instead of hostnames -- just change the log_field value for the visitors database field to point to the cookie field, rather than the hostname field.
-
Bandwidth. Bandwidth is the total number of bytes transferred. It is available only in log formats that track bytes transferred. Bandwidth is tracked for every log entry that is accepted, whether it is accepted "as a hit" or "as a page view". For log formats which track both inbound and outbound bandwidth, Sawmill can report both simultaneously.
-
Sessions. Several of Sawmill's reports deal with "session" information, including the "sessions overview" and the "paths (clickstreams)" report. Sessions are similar to visitors, except that they can "time out." When a visitor visits the site, and then leaves, and comes back later, it will count as two sessions, even though it's only one visitor.
-
Session events. A page view which occurs during a session is a session event. For web server logs, this number is similar to page views, but may be smaller, because it does not include page views which are not in any session. That can occur if the page view is a reload (two consecutive hits on the same page), or if the page view is a part of a session which has been discarded because it is too long.
Media Server Numerical Fields
Media servers have their own distinct numerical fields. Some of these are directly from the log data; others are computed by the Sawmill plug-in. Different plug-ins report different fields; fields may include:
-
Stream Duration. Available in Flash, Limelight, Wowza, and other analyses, this field reports the total amount of time spent streaming, including time spent playing, paused, or seeking.
-
Play Duration. Available in Wowza analysis, this field reports the total amount of time spent playing/viewing the video/audio stream. Time spent paused, or time spend seeking, is not included in this number. This is usually the most useful of the Duration fields.
-
Pause Duration. Avaiable in Wowza analysis, this field reports the total amount of time spent paused during a streaming session.
-
Session Duration. Available in Flash and other analyses, this field reports the total amount of time the session was active, from the first 'session start' event in the log to the 'session end' event, which includes the stream duration and also the time after the session starts and before streaming begins, and the time after streaming and before the end of the session.
-
Publish Duration. Available in Wowza analysis, this field reports the total amount of time the stream was published, from the first 'publish' event in the log to the 'unpublish' event, which includes the stream duration and also the time after connection and before streaming begins, adn the time after streaming and before disconnection.
-
Events. Available in Microsoft Media Server, Flash, Wowza, and other analyses, this reports the total number of server events, i.e., the total number of lines in the log data, regardless of the event type. This includes events like connect, disconnect, and errors, in addition to successful stream events.
-
Streams. Available in Wowza, this counts the number of unique values of the x-stream-id field, i.e., the number of unique streams viewed.
-
Successful Accesses. Available in Microsoft Media Server, this reports the number of stream attempts which did not result in an error, i.e., those events which had a status code in the 200s, or 408. Broken links and other errors are not counted in this number.
-
Concurrent Connections. Available in Wowza, Flash, Microsoft Media Server, and any other media format which uses the Media Reports snapon, this reports the number of concurrent connections to the server.
How Concurrent Connections are Calculated and Reported
Sawmill uses a database filter to sort the events chronologically, and then tags each event with a concurrent connection value representing the number of connections at the time of that event, as computed by examining the start and end of each event and counting the overlapping connections at each logged timestamp. For non-aggregating reports like Log Detail, this value is shown directly in the report. For aggregating reports (most reports are aggregating reports), the column shows the maximum number of concurrent connections for all aggregated events (e.g., for all events contributing to a particular line of a table report). This is a global number, computed globally across the entire dataset, so it is generally only useful for global reports like Overview, and for date/time reports like Year, Months, Days, etc. Adding this column to other reports will give results which are correct according to the algorithm described above, but probably are not what is expected. For instance, adding it to Countries will not give the number of simultaneous connections from each country; it will give the maximum number of simultaneous connections which existed to the whole server (from all countries) at a time when each country accessed the server. Similarly, this number does not show the maximum concurrent connections per stream name or per publishing point, when seen in the Streams report or similar. If you want to know the number of connections to a particular stream (rather than the number of connections to the server when the stream was accessed, which is what this global value would show), you can attach another Concurrent Connections snapon, and choose the stream name as the "resource" field; this will count concurrent connections separately, not globally but for each value of that field, which will give the expected results in a report of that field only.