Newsletters



Sawmill Newsletter

  February 15, 2010



Welcome to the Sawmill Newsletter!

You’re receiving this newsletter because during the downloading or purchase of Sawmill, you checked the box to join our mailing list. If you wish to be removed from this list, please send an email, with the subject line of “UNSUBSCRIBE” to newsletter@sawmill.net (please include the entire message, as the identifying information is at the bottom).


News

Sawmill 8.1.3 shipped on February 12, 2010. This is an bug-fix release--it fixes a number of bugs. This release is free to existing Sawmill 8 users.  It is recommended for anyone who is experiencing problems with Sawmill 8.1.2 or earlier. Sawmill 8.1.2 has a known issue with some Windows installations. It requires the Microsoft Visual Studio 2008 redistributable package, but does not install it. Most Windows systems have this package already, but for those that don't, Sawmill will not run. If you are experiencing this problem, upgrade to 8.1.3, or install the redistributable package (x86 or x64). You can download Sawmill 8.1.3 from http://sawmill.net/download.html .

Sawmill 7 users can upgrade to Sawmill 8 for half of the license price; or if you have Premium Support, the upgrade is free. Major features of Sawmill 8 include support for Oracle and Microsoft SQL Server databases, real-time reporting, a completely redesigned web interface, better multi-processor and multi-core support, and role-based authentication control.

This issue of the Sawmill Newsletter describes the process of updating the databases of multiple profiles regularly, and performance considerations.


Get The Most Out Of Sawmill With Professional Services

Looking to get more out of your statistics from Sawmill? Running short on time, but need the information now to make critical business decisions? Our Professional Service Experts are available for just this situation and many others. We will assist in the initial installation of Sawmill using best practices; work with you to integrate and configure Sawmill to generate reports in the shortest possible time. We will tailor Sawmill to your environment, create a customized solution, be sensitive to your requirements and stay focused on what your business needs are. We will show you areas of Sawmill you may not even be aware of, demonstrating these methods will provide you with many streamlined methods to get you the information more quickly. Often you'll find that Sawmill's deep analysis can even provide you with information you've been after but never knew how to reach, or possibly never realized was readily available in reports. Sawmill is an extremely powerful tool for your business, and most users only exercise a fraction of this power. That's where our experts really can make the difference. Our Sawmill experts have many years of experience with Sawmill and with a large cross section of devices and business sectors. Our promise is to very quickly come up with a cost effective solution that fits your business, and greatly expand your ROI with only a few hours of fee based Sawmill Professional Services. For more information, a quote, or to speak directly with a Professional services expert contact consulting@flowerfire.com.



Tips & Techniques: Updating Multiple Profiles


A typical installation of Sawmill involves multiple profiles; some installations have hundreds or thousands of profiles. As new information appears in the log files, the database must be updated to include the newest log data in the reports. In the simplest, typical case, this is done in Admin -> Scheduler by creating a new Schedule with a single task every night:

Update All

Scheduler: Update All Profiles


In the simplest case, that's all there is to it. If your log files aren't too large, a scheduled task like that run every night at midnight, will ensure that in the morning all your profiles have been updated with the latest data as of midnight.

However, if you have very large log data or a very large number of profiles, the time required to sequentially update all profiles will exceed the time you have available. For instance, if you need reports by 8:00 AM and the update begins at midnight, and it takes more than 8 hours, then that's a problem. The remainder of this newsletter is devoted to discussing what can be done if the simple update approach described above takes too long.


Solution 1: Process Only The New Log Data

If the profiles are pointing to a growing log source, e.g., a single file to which new lines are added, then the database updates may have to scan through all historical log data before finding what's new. This can be slow, and can be avoided through log rotation and scripting. For best performance, the log source should contain only the new log data; i.e., after every successful update, the log data should be rotated or archived somewhere else, so the next update will find only what's new.

The "Skip previously seen files" option can also help with this; even if the historical log files are still in the log source, this option will skip them quickly (without having to look through the old data), as long as the filenames of the logs remain the same. This option will work only if files in the log source are not growing, for instance if only daily timestamped archived "ZIP" versions of the logs are processed by Sawmill, and the current log file (uncompressed) is growing but is not included in the log source. So an ideal situation is to (1) write new log entries to an uncompressed file which is growing all the time (e.g., today.log), (2) periodically compress that file to a daily timestamped logfile (e.g., 2010-02-15.zip), and (3) process these timestamped logs (*.zip) with Sawmill, using the "Skip previously seen files" option to skip all previously processed archived logs.


Solution 2: Simplify The Profiles

Another way to speed updates is to simplify the profiles, or their databases. The simplest way to do this is to turn off some or all cross-reference groups or indices, or to remove session information. If each profile has only a small amount of data, the performance benefits of indices and cross-references are not needed, and removing them will greatly improve the speed of the database update. Session information (the Session reports, and columns like session duration) is not always needed--it depends on what type of information is needed in reports--and removing it will also significantly speed database updates. Other ways to simplify the database include removing database fields, and filtering out more events using log filters. All of these approaches have been discussed in past newsletters. Simpler databases build and update faster, so if all profiles are simplified in this way, the entire nightly update will take less time.


Solution 3: Use A Faster Disk

Much of the time spent in database updates is reading or writing from the disk. So, update performance can usually be improved by using a faster disk.

Solution 4: Use Multiple Processors

If you have multiple processors or cores on your Sawmill server, you can take advantage of them in several ways:

Solution 4a: Use Sawmill Enterprise

Sawmill Enterprise automatically splits the processing of log data between processors. If you have eight processors, the speed of log parsing/importing will be nearly eight times faster. The remaining steps in the update--updating xrefs, updating indices, updating sessions--remain single-processor tasks, however; so if you have many xrefs or indices, the benefit of multiple processors will be less. See Solution 2: Simplify The Profiles--if you remove xrefs, indices, and/or sessions, the benefit of multiple processors under Enterprise will be increased.

Solution 4b: Schedule Simultaneous Batches Of Updates

In all examples below, four cores are assumed; this works just as well (better, really) if you have more. If you have four cores, you can schedule four simultaneous batches of updates to run at midnight. For instance, you might name your profiles starting with 1, 2, 3, or 4, and then use the "pattern" option in the Scheduler to create four schedules, all running at midnight, to update the 1* profiles, the 2* profiles, the 3* profiles, and the 4* profiles. This will run those four batches simultaneously, which theoretically will let them run four times faster than if they ran in sequence. But, there are some caveats.

Caveat 1: don't use multi-processor builds in each profile if you do this! Sawmill Enterprise automatically splits the log processing step of each profile to use all cores, so if you run four simultaneous updates, each using four cores, you'll have 16 tasks running simultaneously at a four-core system, which could seriously bog it down. So, set each profile to use a single processor, in Config -> Log Processing -> Distributed Processing:

One Processor

to ensure that each profile uses only a single core.

Note on Solution 4b: Reducing Disk Contention Between Batches

Solution 4b, using simultaneous batches of updates, may not give a performance improvement if disk I/O is the bottleneck. If one profile is already using 100% of the disk bandwidth available, then adding three more profiles to the mix will just slow things down, as they compete for the already fully-used disk. Using a faster disk (Solution 3) will help with this. Another solution is to have each batch use a separate disk for its databases. For instance, all the 1* profiles could have their databases on disk 1 (changing the Database Directory in Config -> Database -> Server), and all the 2* profiles could have their databases on disk 2, etc. This would eliminate all disk contention between batches, potentially allowing them to run at full speed without competing.

Solution 4c: Use multisawmill.pl

Simultaneous batches (Solution 4b) works well if the batches are of the same size, but when they are irregularly sized, it might take batch 1 much longer than batches 2, 3, or 4, resulting in a long delay before the whole process is complete. A perl script included in the Extras folder of Sawmill, multisawmill.pl, provides a solution for this. multisawmill.pl, which must be modified before it can be used (in particular, $options must be set to "-a ud" for it to do updates), will run database updates (or other tasks) on every profile, four at a time (or more; edit it to set the number to the number of cores you have). As each task completes, it starts another one, until all profiles have been processed. This effectively creates one queue per processor, and feeds profiles into each queue, insuring they are all full, until all profiles are done. In the case of irregularly sized batches, this is faster, because if a few profiles are very large, they may end up taking almost all the time of one queue, while hundreds of small profiles are sent through another queue. So, it generally ensures that all processors are working all the time. However, use of this script requires a perl installation, and someone with at least minimum expertise in editing and running perl scripts.


Solution 5: Use Multiple Servers/Installations

If the solutions above are still not sufficient to update all profiles in the time required, it may be necessary to use multiple servers. Continuing the batch approach above, it is possible to create four separate servers (separate physical computers), probably with multiple cores per server, each with a separate Sawmill installation (with a separate Sawmill license). With one server per batch, batch 1 can be run on server 1, batch 2 can be run on server 2, etc. All of the solutions above can be applied per server; for instance server 1 can use multiple processors, or multiple sub-batches, or multisawmill.pl, etc. This approach provides high scalability--by adding servers and splitting profiles among servers, a multi-profile installation can scale indefinitely, limited only by the time required to update the largest single profile.


Professional Services

This newsletter describes tuning and optimization of multiple profile database updates. If you need assistance with tuning the performance of Sawmill or with configuring and using multisawmill.pl or with any other Sawmill tasks, our Sawmill Experts can help. Contact sales@sawmill.net for more information.



[Article revision v1.0]
[ClientID: 43726]