You’re receiving this newsletter because during the
downloading or purchase of
Sawmill, you checked the box to join our mailing list. If you wish to
be removed from this list, please send an email, with the subject line
of “UNSUBSCRIBE” to newsletter@sawmill.net
(please include the entire message, as the identifying information is
at the bottom).
News
Sawmill 8.1.3 shipped on February 12, 2010. This is an bug-fix
release--it fixes a number of
bugs. This release is free to existing Sawmill 8 users. It is
recommended
for anyone who is experiencing problems with Sawmill 8.1.2 or earlier.
Sawmill 8.1.2 has a known issue with some Windows installations. It
requires the Microsoft Visual Studio 2008 redistributable package, but
does not install it. Most Windows systems have this package already,
but for those that don't, Sawmill will not run. If you are experiencing
this problem, upgrade to 8.1.3, or install the redistributable package (x86
or x64).
You can download Sawmill 8.1.3 from http://sawmill.net/download.html
.
Sawmill 7 users can upgrade to Sawmill 8 for half of the license price;
or if you have Premium Support, the upgrade is free. Major features of
Sawmill 8
include
support for Oracle and Microsoft SQL Server databases, real-time
reporting, a completely redesigned web interface, better
multi-processor and multi-core support, and role-based authentication
control.
This issue of the Sawmill Newsletter describes the process of
updating the databases of multiple profiles regularly, and performance
considerations.
Get The Most Out Of Sawmill With Professional Services
Looking to get more out of your statistics from Sawmill? Running short
on time, but need the information now to make critical business
decisions? Our Professional Service Experts are available for just this
situation and many others. We will assist in the initial installation
of Sawmill using best practices; work with you to integrate and
configure Sawmill to generate reports in the shortest possible time. We
will tailor Sawmill to your environment, create a customized solution,
be sensitive to your requirements and stay focused on what your
business needs are. We will show you areas of Sawmill you may not even
be aware of, demonstrating these methods will provide you with
many streamlined methods to get you the information
more quickly. Often you'll find that Sawmill's deep analysis can even
provide you with information you've been after but never knew how to
reach, or
possibly never realized was readily available in reports. Sawmill is an
extremely powerful tool for your business, and most users only exercise
a fraction of this power. That's where our experts really can make the
difference. Our Sawmill experts have many years of experience with
Sawmill
and with a large cross section of devices and business sectors. Our
promise is to very quickly come up with a cost effective solution that
fits your business, and greatly expand your ROI with only a few
hours of fee based Sawmill Professional Services. For more information,
a quote, or to speak directly with a Professional services expert
contact
consulting@flowerfire.com.
Tips & Techniques: Updating Multiple Profiles
A typical installation of Sawmill involves multiple profiles; some
installations have hundreds or thousands of profiles. As new
information appears in the log files, the database must be updated to
include the newest log data in the reports. In the simplest, typical
case, this is done in Admin -> Scheduler by creating a new Schedule
with a single task every night:
Scheduler: Update All Profiles
In the simplest case, that's all there is to it. If your log files
aren't too large, a scheduled task like that run every night at
midnight, will ensure that in the morning all your profiles have been
updated with the latest data as of midnight.
However, if you have very large log data or a very large number of
profiles, the time required to sequentially update all profiles will
exceed the time you have available. For instance, if you need reports
by 8:00 AM and the update begins at midnight, and it takes more than 8
hours, then that's a problem. The remainder of this newsletter is
devoted to discussing what can be done if the simple update approach
described above takes too long.
Solution 1: Process Only The New Log Data
If the profiles are pointing to a growing log source, e.g., a
single file to which new lines are added, then the database updates may
have to scan through all historical log data before finding what's new.
This can be slow, and can be avoided through log rotation and
scripting. For best performance, the log source should contain only the
new log data; i.e., after every successful update, the log data
should be rotated or archived somewhere else, so the next update will
find only what's new.
The "Skip previously seen files" option can also help with this; even
if the historical log files are still in the log source, this option
will skip them quickly (without having to look through the old data),
as long as the filenames of the logs remain the same. This
option will work only if files in the log source are not growing, for
instance if only daily timestamped archived "ZIP" versions of the logs
are processed by Sawmill, and the current log file (uncompressed) is
growing but is not included in the log source. So an ideal situation is
to (1) write new log entries to an uncompressed file which is growing
all the time (e.g., today.log), (2) periodically compress that file to
a daily timestamped logfile (e.g., 2010-02-15.zip), and (3) process
these timestamped logs (*.zip) with Sawmill, using the "Skip previously
seen files" option to skip all previously processed archived logs.
Solution 2: Simplify The Profiles
Another way to speed updates is to simplify the profiles, or their
databases. The simplest way to do this is to turn off some or all
cross-reference groups or indices, or to remove session information. If
each profile has only a small amount of data, the performance benefits
of indices and cross-references are not needed, and removing them will
greatly improve the speed of the database update. Session information
(the Session reports, and columns like session duration) is not always
needed--it depends on what type of information is needed in
reports--and removing it will also significantly speed database
updates. Other ways to simplify the database include removing database
fields, and filtering out more events using log filters. All of these
approaches have been discussed in past newsletters. Simpler databases
build and update faster, so if all profiles are simplified in this way,
the entire nightly update will take less time.
Solution 3: Use A Faster Disk
Much of the time spent in database updates is reading or writing from
the disk. So, update performance can usually be improved by using a
faster disk.
Get a faster hard drive. If possible, use a faster physical disk.
For instance, if you're using a 7200RPM disk, you may see twice the
performance if you go to a 15000RPM disk.
Get a faster disk controller: if the disk is capable of higher
performance than your controller can deliver, upgrading the controller
will improve update performances.
Use RAID: Using RAID 10 (or RAID 0) can greatly improve disk
performance by splitting accesses across multiple disks.
Don't use RAID 5/6: Even if you can't use striping (RAID 10), at
least don't use RAID 5 or RAID 6. These provide redundancy on the
cheap, but are slow to write; database access can be twice as slow with
RAID 5/6. Use RAID 10 instead (striping and mirroring) for best
performance.
Solution 4: Use Multiple Processors
If you have multiple processors or cores on your Sawmill server, you
can take advantage of them in several ways:
Solution 4a: Use Sawmill Enterprise
Sawmill Enterprise automatically splits the processing of log data
between processors. If you have eight processors, the speed of log
parsing/importing will be nearly eight times faster. The remaining
steps in the update--updating xrefs, updating indices, updating
sessions--remain single-processor tasks, however; so if you have many
xrefs or indices, the benefit of multiple processors will be less. See
Solution 2: Simplify The Profiles--if you remove xrefs, indices, and/or
sessions, the benefit of multiple processors under Enterprise will be
increased.
Solution 4b: Schedule Simultaneous Batches Of Updates
In all examples below, four cores are assumed; this works just as well
(better, really) if you have more. If you have four cores, you can
schedule four simultaneous batches of updates to run at midnight. For
instance, you might name your profiles starting with 1, 2, 3, or 4, and
then use the "pattern" option in the Scheduler to create four
schedules, all running at midnight, to update the 1* profiles, the 2*
profiles, the 3* profiles, and the 4* profiles. This will run those
four batches simultaneously, which theoretically will let them run four
times faster than if they ran in sequence. But, there are some caveats.
Caveat 1: don't use multi-processor builds in each profile if you do
this! Sawmill Enterprise automatically splits the log processing step
of each profile to use all cores, so if you run four simultaneous
updates, each using four cores, you'll have 16 tasks running
simultaneously at a four-core system, which could seriously bog it
down. So, set each profile to use a single processor, in Config ->
Log Processing -> Distributed Processing:
to ensure that each profile uses only a single core.
Note on Solution 4b: Reducing Disk Contention Between Batches
Solution 4b, using simultaneous batches of updates, may not
give a performance improvement if disk I/O is the bottleneck. If one
profile is already using 100% of the disk bandwidth available, then
adding three more profiles to the mix will just slow things down, as
they compete for the already fully-used disk. Using a faster disk
(Solution 3) will help with this. Another solution is to have each
batch use a separate disk for its databases. For instance, all
the 1* profiles could have their databases on disk 1 (changing the
Database Directory in Config -> Database -> Server), and all the
2* profiles could have their databases on disk 2, etc. This would
eliminate all disk contention between batches, potentially allowing
them to run at full speed without competing.
Solution 4c: Use multisawmill.pl
Simultaneous batches (Solution 4b) works well if the batches are of the
same size, but when they are irregularly sized, it might take batch 1
much longer than batches 2, 3, or 4, resulting in a long delay before
the whole process is complete. A perl script included in the Extras
folder of Sawmill, multisawmill.pl, provides a solution for this.
multisawmill.pl, which must be modified before it can be used (in
particular, \$options must be set to "-a ud" for it to do updates), will
run database updates (or other tasks) on every profile, four at a time
(or more; edit it to set the number to the number of cores you have).
As each task completes, it starts another one, until all profiles have
been processed. This effectively creates one queue per processor, and
feeds profiles into each queue, insuring they are all full, until all
profiles are done. In the case of irregularly sized batches, this is
faster, because if a few profiles are very large, they may end up
taking almost all the time of one queue, while hundreds of small
profiles are sent through another queue. So, it generally ensures that
all processors are working all the time. However, use of this script
requires a perl installation, and someone with at least minimum
expertise in editing and running perl scripts.
Solution 5: Use Multiple Servers/Installations
If the solutions above are still not sufficient to update all profiles
in the time required, it may be necessary to use multiple servers.
Continuing the batch approach above, it is possible to create four
separate servers (separate physical computers), probably with multiple
cores per server, each with a separate Sawmill installation (with a
separate Sawmill license). With one server per batch, batch 1 can be
run on server 1, batch 2 can be run on server 2, etc. All of the
solutions above can be applied per server; for instance server 1 can
use multiple processors, or multiple sub-batches, or multisawmill.pl,
etc. This approach provides high scalability--by adding servers and
splitting profiles among servers, a multi-profile installation can
scale indefinitely, limited only by the time required to update the
largest single profile.
Professional Services
This newsletter describes tuning and optimization of multiple profile
database updates. If
you need
assistance with tuning the performance of Sawmill or with configuring
and using multisawmill.pl
or
with
any other Sawmill tasks, our Sawmill Experts
can help. Contact sales@sawmill.net
for more
information.