You’re receiving this newsletter because during the
downloading or purchase of
Sawmill, you checked the box to join our mailing list. If you wish to
be removed from this list, please send an email, with the subject line
of “UNSUBSCRIBE” to newsletter@sawmill.net .
News
Sawmill 7.2.10 shipped on August 4, 2007. This is a minor "bug fix"
release, and it is free to existing Sawmill 7 users. It is not a
critical update, but it does fix a number of bugs, adds support for
many new log formats, and adds a few small features. It is recommended
for anyone who is experiencing problems with Sawmill 7.2.9 or earlier.
You can download it from http://sawmill.net/download.html
. This issue of the Sawmill Newsletter describes using database
merges to improve database build performance.
Get the Most out of Sawmill with Professional Services
Looking to get more out of your statistics from Sawmill? Running short
on time, but need the information now to make critical business
decisions? Our Professional Service Experts are available for just this
situation and many others. We will assist in the initial installation
of Sawmill using best practices; work with you to integrate and
configure Sawmill to generate reports in the shortest possible time. We
will tailor Sawmill to your environment, create a customized solution,
be sensitive to your requirements and stay focused on what your
business needs are. We will show you areas of Sawmill you may not even
be aware of, demonstrating these methods will provide you with
many streamlined methods to get you the information
more quickly. Often you'll find that Sawmill's deep analysis can even
provide you with information you've been after but never knew how to
reach, or
possibly never realized was readily available in reports. Sawmill is an
extremely powerful tool for your business, and most users only exercise
a fraction of this power. That's where our experts really can make the
difference. Our Sawmill experts have many years of experience with
Sawmill
and with a large cross section of devices and business sectors. Our
promise is to very quickly come up with a cost effective solution that
fits your business, and greatly expand your ROI with only a few
hours of fee based Sawmill Professional Services. For more information,
a quote, or to speak directly with a Professional services expert
contact
consulting@flowerfire.com.
Tips & Techniques: Using Database Merges
Note: Database merge is available only with the internal database;
it is not available for profiles that use a MySQL database.
A default profile created in Sawmill uses a single processor (single
core) to parse log data and build the database. This is a good choice
for shared environments, where using all processors can bog down the
system, but for best performance, it is best to set "log processing
threads" to the number of processors, in the Log Processing options in
the Config page of the profile. That will split log processing across
multiple processors, improving the performance of database builds and
updates by using all processors on the system. This is
available with Sawmill Enterprise--non-Enterprise versions of Sawmill
can only use one processor.
If the dataset is too large to process in an acceptable time on a
single computer, even with multiple processors, it is possible to split
the processing across multiple machines. This is accomplished by
building a separate database on each system, and then merging them to
form a single large database. For instance, this command line adds the
data from the database for profile2 to the database for profile1:
sawmill -p profile1 -a md -mdd Databases/profile2/main
or on Windows:
SawmillCL -p profile1 -a md -mdd Databases\profile2\main
After this command completes, profile1 will show the data it
showed before the command, and the data that profile2
showed before the command (profile2 will be unchanged).
This makes it possible to build a database twice as fast using this
sequence:
sawmill -p profile1 -a bd
sawmill -p profile2 -a bd
sawmill -p profile1 -a md -mdd Databases/profile2/main
(Use SawmillCL and \ slashes on Windows, as shown above).
The critical piece is that the first two commands must run simultaneously;
if you run them one after another, they will take as long as building
the whole database. But on a two-processor system, they can both use a
full CPU, fully using both CPUs, and running nearly twice as fast as a
single build. The merge then takes some extra time, but overall this is
still faster than a single-process build.
Running a series of builds simultaneously can be done by opening
multiple windows and running a separate build in each window, or by
"backgrounding" each command before starting the next (available on
UNIX and similar systems). But for a fully automated environment, this
is best done with a script. The attached perl script, multisawmill.pl,
can be used to build multiple databases simultaneously. You will need
to modify the top of the script to match your environment, and set the
number of threads; then when you run it, it will spawn many database
builds simultaneously (the number you specified), and as each
completes, it will start another one. This script is provided as-is,
with no warranty, as a proof-of-concept of a
multiple-simultaneous-build script.
Using the attached script, or something like it, you can apply this
approach to much larger datasets, for instance to build a year of data:
1. Create a profile for each day in the year (it is probably easiest
to use Create Many Profiles to do this; see Setting
Up Multiple Users in the Sawmill documentation).
2. Build all profiles, 8 at a time (or however many cores you have
available). If you have multiple machines available, you can use
multiple installations of Sawmill, by partitioning the profiles into
multiple systems. For instance, if you have two 8-core nodes in the
Sawmill cluster, you could build 16 databases at a time, or if you had
four 4-core nodes in the cluster, you could build 32 databases at a
time. This portion of the build can give a linear speedup, with nearly
32x faster log processing than using a single process, by using a
8-core 4-node cluster.
3. Merge all the databases. The simplest way to do this, in a 365-day
example, is to run 364 merges, adding each day into the final one-year
database.
When the merge is done, the one-year database will function as though
it had been built in a single "build database" step--but it will have
taken much less time to build.
Advanced Topic: Using Binary Merges
The example described above uses "sequential merges" for step 3--it
runs 364 separate merge steps, one after another, to create the final
database. Each of these merges uses only a single processor of a single
node, so this portion of the build does not use the cluster
efficiently; and this can cause step 3 to take longer than step 2: the
merge can be slower than the processing and building of data. To
improve this, a more sophisticated merge method can be scripted, using
a "binary tree" of merges to build the final database. Roughly, each
code on each node is assigned two one-day databases, which they merge,
forming two-day databases. Then each core of each node is assigned two
two-day databases, which they merge to form a four-day database. This
continues until a final merge combines two half-year databases into a
one-year database. The number of merge stages is much less than the
number of merges required if done sequentially.
For simplicity, let's assume we're merging 16 days, on a 4-core
cluster. On a 4-core cluster, we can do 4 merges at a time.
Step 1, core 1: Merge day1 with day2, creating day[1,2].
Step 1, core 2: Merge day3 with day4, creating day[3,4].
Step 1, core 3: Merge day5 with day6, creating day[5,6].
Step 1, core 4: Merge day7 with day8, creating day[7,8].
When those are complete, we would continue:
Step 2, core 1: Merge day9 with day10, creating day[9,10].
Step 2, core 2: Merge day11 with day12, creating day[11,12].
Step 2, core 3: Merge day13 with day14, creating day[13,14].
Step 2, core 4: Merge day14 with day16, creating day[15,16].
Now we have taken 16 databases and merged them in two steps into 8
databases. Now we merge them into four databases:
Step 3, core 1: Merge day[1,2] with day[3,4], creating day[1,2,3,4].
Step 3, core 2: Merge day[5,6] with day[7,8], creating day[5,6,7,8].
Step 3, core 3: Merge day[9,10] with day[11,12], creating
day[9,10,11,12].
Step 3, core 4: Merge day[13,14] with day[15,16], creating
day[13,14,15,16].
Now we merge into two databases:
Step 4, core 1: Merge day[1,2,3,4] with day[5,6,7,8], creating
day[1,2,3,4,5,6,7,8].
Step 4, core 2: Merge day[9,10,11,12] with day[13,14,15,16], creating
day[9,10,11,12,13,14,15,16].
And finally:
Step 5, core 1: Merge day[1,2,3,4,5,6,7,8] with
day[9,10,11,12,13,14,15,16], creating
day[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16].
So in 5 steps, we have build what would have required 15 steps using
sequential merges: a 16-day database. This approach can be used to
speed up much larger merges even more.
Advanced Topic: Re-using One-Day Databases
In the approach above, the one-day databases are not destroyed by the
merge, which reads data from them but does not write to them. This
makes it possible to keep the one-day databases for fast access to
reports from a particular day. By leaving the one-day databases after
the merge is complete, users will be able to select a particular
database from the Profiles list, to see fast reports for just that day
(a one-day database is much faster to generate reports than a 365-day
database).
Advanced Topic: Using Different Merge Units
In the discussion above, we used one day as the unit of merge, but any
unit can be used. In particular, if you are generating a database
showing reports from 1000 sites, you could use a site as the
unit. After building the databases from 1000 sites, you could then
merge all 1000 databases to create an all-sites profile for
administrative overview, leaving each of the 1000 one-site profiles to
be accessed by its users.
Questions or suggestions? Contact support@sawmill.net. If would
you
like a Sawmill Professional Services expert to implement this, or
another
customization, contact
consulting@sawmill.net.