Newsletters



Sawmill Newsletter

  August 15, 2008



Welcome to the Sawmill Newsletter!

You’re receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of “UNSUBSCRIBE” to newsletter@sawmill.net .


News

Come see us at Streaming Media West!

We will be at Streaming Media West in San Jose, CA, September 23-25.  This year's event is The Business & Technology of Online Video.
See our newest version of Sawmill, we will be giving away t-shirts and have special offers during the show. Hope to see you there!

Sawmill 7.2.15 shipped on May 16, 2008. This is a minor "bug fix" release, and it is free to existing Sawmill 7 users.  It is recommended for anyone who is experiencing problems with Sawmill 7.2.14 or earlier. You can download it from http://sawmill.net/download.html .

Sawmill 8 is well into "alpha" stage. That means that we have completed development of all features, and Sawmill 8 will ship (8.0.0) when all known bugs are fixed. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control. A "beta" version will be publicly available when all major bugs are fixed, probably in the next few weeks. Watch sawmill.net for the beta announcement!

This issue of the Sawmill Newsletter describes using Sawmill to import data into a MySQL database, and to query it directly with external SQL queries.


Get the Most out of Sawmill with Professional Services

Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.



Tips & Techniques:
Using Sawmill To Query Log Data With SQL

Sawmill Enterprise supports the use of a MySQL server as the back-end database (this is available only with Enterprise--Sawmill Professional and Sawmill Lite cannot use MySQL). [Note: Sawmill has been certified against the commercial version of MySQL server, and it is recommended that Sawmill users purchase the full MySQL Server, rather than using the "community" database server.] Sawmill's own built-in "internal" database is faster than MySQL, and is therefore generally a better choice if reports are to be generated by Sawmill. But MySQL has several major advantages over the internal database, including the ability to run SQL queries. This newsletter discusses the use of SQL to extract information from a MySQL database which was created by Sawmill. The techniques described in this newsletter can be used to run arbitrary SQL queries against log data in any of the 800+ log formats Sawmill supports; other log formats can be supported through custom log format plug-ins.


Creating A MySQL Profile And Database

To create a Sawmill profile which uses MySQL as its back-end database, create a profile just as you would for the internal database, but on the Database Options page of the Create Profile Wizard, select "Use MySQL database" (again, this is available only in Sawmill Enterprise). Then enter the hostname, username, and password of your MySQL database server. If you're using a socket to communicate with MySQL, and the socket is not in the default location, you can also enter its pathname on this page. Finally, if you want the name of the database (schema) to be different from the name of the profile, you can enter a different database name on this page:

mysql
Then continue with creating the profile, and at the end, let Sawmill build the database. It is not necessary to view reports through Sawmill now (or ever), if you intend to use the database only through external SQL queries--the "build database" operation will populate all tables in the MySQL database. When the "build database" completes, Sawmill has parsed, normalized, and inserted all log data into the MySQL database, and built the associated itemnum tables (see below).


Querying The Main Table And The Itemnum Tables

The main table of the database is called "logfile," and contains one row for each event in the log data. In this example, we are analyzing a small 5000-line Apache log file, so each line of log data is a separate event; so the resulting logfile contains 5000 rows (all examples below are captures from the "mysql" command-line program. If your mail client does not support the HTML tags used to render this in a mono-spaced font, these may format poorly; changing the font to a mono-spaced font like Courier or Monaco will show them better):


  mysql> select count(*) from logfile;
  +----------+
  | count(*) |
  +----------+
  |     5000 |
  +----------+


Let's look at one row from logfile:


  mysql> select * from logfile limit 1;
+-----------+---------------------+-----------------+-------------+-------------+----------+------+-----------+------+-------------------+--------------+----------+--------------------+----------+--------------+------+--------+----------+----------------------+---------------+---------------+-------------+------------------+--------+---------------+--------------------+-----------------+------+------------+---------+-------+--------+--------------+------------------+----------+------+
| loadorder | date_time           | bottomleveldate | day_of_week | hour_of_day | hit_type | page | file_type | worm | screen_dimensions | screen_depth | hostname | domain_description | location | organization | isp  | domain | referrer | referrer_description | search_engine | search_phrase | web_browser | operating_system | spider | server_domain | authenticated_user | server_response | hits | page_views | spiders | worms | errors | broken_links | screen_info_hits | visitors | size |
+-----------+---------------------+-----------------+-------------+-------------+----------+------+-----------+------+-------------------+--------------+----------+--------------------+----------+--------------+------+--------+----------+----------------------+---------------+---------------+-------------+------------------+--------+---------------+--------------------+-----------------+------+------------+---------+-------+--------+--------------+------------------+----------+------+
|         1 | 1998-04-07 16:53:06 |               2 |           2 |           2 |        2 |    2 |         2 |    2 |                 2 |            2 |        2 |                  2 |        2 |            2 |    2 |      2 |        2 |                    2 |             2 |             2 |           2 |                2 |      2 |             2 |                  2 |               2 |    1 |          1 |       0 |     0 |      0 |            0 |                0 |        2 |  734 |
+-----------+---------------------+-----------------+-------------+-------------+----------+------+-----------+------+-------------------+--------------+----------+--------------------+----------+--------------+------+--------+----------+----------------------+---------------+---------------+-------------+------------------+--------+---------------+--------------------+-----------------+------+------------+---------+-------+--------+--------------+------------------+----------+------+


This corresponds to the first line of the log data, which is this:

  140.177.203.25 - - [07/Apr/1998:16:53:06 -0500] "GET / HTTP/1.0" 200 734 "-" "Mozilla/4.04 [en] (X11; I; SunOS 5.6 sun4u)"

The date_time clearly matches the log data, and the 734 in the size column matches the 734 in the log data, but the rest of the columns are not so clear. That's because all non-aggregating (non-numerical) fields are normalized in logfile; instead of being directly included in logfile, their values are included in auxiliary tables (the itemnum tables), and references to those values are included in logfile. So for instance, hostname is 2 in logfile, which corresponds to 140.177.203.25 in the hostname itemnum table (the table called hostnameitemnum, which has columns called itemnum and hostname, and is used to map itemnums to hostname, and vice versa). Querying the hostnameitemnum table with SQL, and selecting only the row where itemnum=2, shows the correllation:


  mysql> select * from hostnameitemnum where itemnum = 2;
  +---------+----------------+
  | itemnum | hostname       |
  +---------+----------------+
  |       2 | 140.177.203.25 |
  +---------+----------------+



Joining The Main Table (logfile) To The Itemnums Tables

By joining one or more itemnum tables to the main table, you can get results similar to those shown in Sawmill's own reports. For instance, let's generate a "top 10 hostnames" report. We can do that by selecting/summing logfile, grouping on hostname, joining in hostnameitemnum to get real hostnames in the result (rather than normalized hostname itemnums), ordering by hits descending, and limiting to the top 10:


  mysql> select i.hostname, sum(hits) as hits, sum(page_views) as page_views, sum(size) as size from logfile l left join hostnameitemnum i on l.hostname = i.itemnum group by hostname order by hits desc limit 10;
  +-----------------------------+------+------------+----------+
  | hostname                    | hits | page_views | size     |
  +-----------------------------+------+------------+----------+
  | lipowitz.isdn.uiuc.edu      |  692 |        236 |  6348398 |
  | 192.17.19.150               |  466 |         46 |  6363739 |
  | 192.17.19.148               |  317 |        136 | 24279573 |
  | pale.kai.com                |  308 |         70 |  5289858 |
  | flowerfire.isdn.uiuc.edu    |  242 |         52 |  2337963 |
  | spider.unh.edu              |  171 |         19 |   433709 |
  | 195.101.37.244              |   87 |         21 |   575602 |
  | 206.148.222.50              |   79 |         25 |   876475 |
  | isdn-5.nii.enterconnect.net |   79 |         16 |   579010 |
  | gli2302.ctea.com            |   75 |          9 |   436318 |
  +-----------------------------+------+------------+----------+


Any of Sawmill's standard table reports can be generated similarly using SQL.


Filtering By Itemnum

Now suppose we want to filter the result. Any type of filter is possible using a WHERE clause in the SQL query. In this case, we'll filter the report above to show only .com hostnames. By including "where i.hostname like '%.com'", the query now selects only those rows from logfile which contain hostname values ending with .com, when de-normalized. So, the resulting list contains all .com addresses:


  mysql> select i.hostname, sum(hits) as hits, sum(page_views) as page_views, sum(size) as size from logfile l left join hostnameitemnum i on l.hostname = i.itemnum where i.hostname like '%.com' group by hostname order by hits desc limit 10;
  +--------------------------------+------+------------+---------+
  | hostname                       | hits | page_views | size    |
  +--------------------------------+------+------------+---------+
  | pale.kai.com                   |  308 |         70 | 5289858 |
  | gli2302.ctea.com               |   75 |          9 |  436318 |
  | wat.thedj.com                  |   57 |         22 |  767059 |
  | tosainu.trimark.com            |   45 |         12 |  341875 |
  | gianduia.compecon.com          |   39 |          6 |  206124 |
  | ip252.ts4.phx.inficad.com      |   34 |          4 |  160689 |
  | h-205-217-240-156.netscape.com |   33 |          4 |  173806 |
  | clarendon.weblogic.com         |   32 |          4 |  151206 |
  | gw.vixel.com                   |   30 |          4 |  115755 |
  | cx51617-a.dnpt1.occa.home.com  |   30 |          4 |  150578 |
  +--------------------------------+------+------------+---------+

The filter doesn't have to work on only the primary field of the query (hostname); it can work on any fields in logfile, or the joined fields of any other table. MySQL supports many joins in a single query, so we can join in one table for the main column (hostname), and also join additional tables for filtering. For instance, here are the hostnames which accessed GIF images (the hits column shows how many GIF accesses each hostname had; the size column shows how many GIF bytes each hostname transferred):


  mysql> select i.hostname, sum(hits) as hits, sum(page_views) as page_views, sum(size) as size from logfile l left join hostnameitemnum i on l.hostname = i.itemnum left join file_typeitemnum fi on l.file_type = fi.itemnum where fi.file_type = 'GIF' group by hostname order by hits desc limit 10;
  +-----------------------------+------+------------+--------+
  | hostname                    | hits | page_views | size   |
  +-----------------------------+------+------------+--------+
  | lipowitz.isdn.uiuc.edu      |  432 |          0 | 447331 |
  | 192.17.19.150               |  420 |          0 | 209925 |
  | pale.kai.com                |  235 |          0 | 144966 |
  | 192.17.19.148               |  161 |          0 | 322118 |
  | spider.unh.edu              |  152 |          0 | 129566 |
  | flowerfire.isdn.uiuc.edu    |  151 |          0 | 433550 |
  | gli2302.ctea.com            |   66 |          0 |  56660 |
  | 195.101.37.244              |   65 |          0 |  87507 |
  | isdn-5.nii.enterconnect.net |   63 |          0 |  24508 |
  | 206.148.222.50              |   52 |          0 |  74631 |
  +-----------------------------+------+------------+--------+
  10 rows in set (0.03 sec)



Conclusion

This newsletter describes the simple process for creating a MySQL profile in Sawmill, and using it to import log data into a SQL database. This process can be used to import any log data into a MySQL database. Sawmill already supports all common log formats (800+ different formats as of this writing), and if a particular format isn't on Sawmill's list, it can be added by creating a log format plug-in. This makes it possible to use Sawmill to run arbitrary SQL queries against any textual log data, by (1) creating a Sawmill profile from the log data with MySQL as the back-end database, (2) building the database in Sawmill, and (3) running SQL queries against the resulting database.

If you would like assistance in creating a plug-in for a log format you would like to query with SQL, or if you would like assistance creating SQL queries or scripts to extract the information you need from a Sawmill MySQL database, you can also use Sawmill Professional Services. Our experts have a thorough knowledge of Sawmill, log format plug-ins, and MySQL. Contact sales@sawmill.net for more information.


[Article revision v1.0]
[ClientID: ]