SAWMILL SYSTEM REQUIREMENTS
This document helps you decide the Best Practices for setting up your server.
Operating System and CPU Platform
Sawmill will run on any platform, but we recommend x64 Linux. You can use any distribution,
but Red Hat Enterprise Linux is a good choice for maximum speed, compatibility, and stability.
On other distributions, it may be necessary to build Sawmill from source code.
Other x64 operating systems are also reasonable, including x64 Windows, x64 Mac OS, x64 FreeBSD,
and x64 Solaris. CPU architectures other than x64 will work, but we see better performance
with x64 than with SPARC and other RISC architectures. 32-bit architectures are not recommended
for any large dataset. The address space limitations of 32-bit operating systems can cause
errors in Sawmill when processing large datasets.
We consider a large installation of Sawmill to be a dataset of 10GB or more. With this type of
installation, we recommend using a 64-bit system. In a large installation,
it is best to have a dedicated Sawmill server.
Sawmill usually runs fine on a virtual system but performance can be considerably slower.
We occasionally see errors which seem due to the virtual environment, so physical
hardware is recommended.
Disk and Space
You will need between 200% and 400% the size of your uncompressed log data to store the
Sawmill database. Databases tend towards the high side (400%) on 64-bit systems,
especially when tracking a very large number of numerical fields (more than ten or so).
For instance, if you want to report on 1 terabyte (TB) of log data in a single profile,
you would need up to 4 TB of disk space for the database.
This is the total data in the database, not the daily data added;
if you have 1 TB of log data per day, and want to track 30 days, then that is a 30TB dataset,
and requires between 60TB and 120TB of disk space.
If you are using a separate SQL database server, you will need space to accomodate the server;
the databases use this disk space, and the remainder of Sawmill will fit in
a smaller space, so 1GB should be sufficient.
Sawmill uses the disk intensively during database building and report generation,
for best performance use a fast disk. Ideally use a RAID 10 array of fast disks.
RAID 5 or RAID 6 will hurt performance significantly (about 2x slower than RAID 10 for
database builds) and is not recommended. Write buffering on the RAID controller should
be turned on if possible as it provides an additional 2x performance for database builds.
Network mounts will usually work for storage of the Sawmill database but are not
recommended for performance reasons. We sometimes see errors apparently due to
locking and synchronization issues with network mounts.
Memory
On the Sawmill server, we recommend 2GB of RAM per core for large datasets.
Processor(s)
To estimate the amount of processing power you need, start with the
assumption that Sawmill processes 2000 log lines per second, per
processor core for Intel or AMD processors; or 1000 lines per second for SPARC
or other processors.
Note: This is a conservative assumption; Sawmill can be much faster than this on some
datasets reaching speeds of 10,000-20,000 lines per second per core in some cases. However for
sizing your processor, it is best to use a conservative estimate to ensure that the
specified system is sufficient.
Compute the number of lines in your daily dataset, 200 bytes per line is a good estimate.
This will tell you how many seconds Sawmill will require to build the database. Convert that
to hours, if it is more than six hours you will need more than one processor. You should have
enough processors that when you divide the number of hours by the number of processors, it is
less than 6.
For example:
- 50 Gigabytes (GB) of uncompressed log data per day
- divide by 200 -> ~268 million lines of log data
- divide by 2000 -> ~134 million seconds
- divide by 3600 -> ~37 hours
- divide by 6 -> 6 processors
The use of six hours is based upon the assumption that you don't want to spend more
than six hours per night updating your database to add the latest data. A six hour
nightly build time is a good starting point. It provides some flexibility to modify
or tune the database and filters that can slow down processing and keep within
the processing time available each day. The dataset above could be processed in 9 hours
on four processors, if a 9 hour nightly build time is acceptable.
Database Server
Sawmill includes its own proprietary database server, embedded in the software, but with Enterprise licensing, it is also possible to use other database servers instead: MySQL (version 4.0 or later), Oracle (11), and Microsoft SQL Server (2005 or later).