Server Setup For Best Performance
This document helps you decide the Best Practices for setting up your server.
Operating System and CPU Platform System requirements
Sawmill will run on any platform, but we recommend x64 (64-bit) Linux. You can use any distribution, but Red Hat Enterprise Linux is a good choice for maximum speed, compatibility, and stability. On other distributions, it may be necessary to build Sawmill from source code. Other x64 operating systems are also reasonable choices, including x64 Windows, x64 Mac OS, x64 FreeBSD, and x64 Solaris. CPU architectures other than x64 will work, but overall we see better performance with x64 than with SPARC and other RISC architectures. 32-bit architectures are not recommended for any large dataset. The address space limitations of 32-bit operating systems can cause errors in Sawmill when processing large datasets.
We consider a large installation of Sawmill to be a dataset of 10 GB or more. For datasets greater than 10 GBs we strongly recommend using a 64-bit system, and it is best to have a dedicated Sawmill server.
Sawmill can run on a virtual system, but performance can be considerably slower than running Sawmill on physical server. We occasionally see errors with Sawmill running in a virtual environment, so physical hardware is always recommended, especially for larger datasets.
Disk and Space
You will need between 200% and 400% of the size of your uncompressed log data to store the Sawmill database. Databases tend towards the high side (400%) on 64-bit systems, especially when tracking a very large number of numerical fields (more than ten or so). For instance, if you want to report on 1 terabyte (TB) of log data in a single profile, you would need up to 4 TB of disk space for the database. This is the total data in the database, not the daily data added; if you have 1 TB of log data per day, and want to track 30 days, then that is a 30 TB dataset, and requires between 60 TB and 120 TB of disk space.
IMPORTANT: If your profile uses a database filter which sorts the main table, e.g., a "sessions" analysis (most web server logs do this by default), there is a stage of database build where the entire main table of the database is copied and sorted. During this period, the database will use up to twice the disk space; after the build, the disk space will drop back down to the final size. So for the example above, if there is a "sessions" analysis or other sorting database filter (another example is a "concurrent streams" analysis), it will temporarily use between 4 TB and 8 TB of space per day, before returning to 2 TB - 4 TB at the end of the build.
If you are using a separate SQL database server, you will need space to accommodate the server; the databases use this disk space, and the remainder of Sawmill will fit in a smaller space, so 1 GB should be sufficient.
Sawmill uses the disk intensively during database building and report generation, for best performance use a fast disk. Ideally use a RAID 10 array of fast disks. RAID 5 or RAID 6 will hurt performance significantly (about 2× slower than RAID 10 for database builds) and is not recommended. Write buffering on the RAID controller should be turned on if possible as it provides an additional 2× performance for database builds.
Network mounts will usually work for storage of the Sawmill database but are not recommended for performance reasons. We sometimes see errors apparently due to locking and synchronization issues with network mounts.
On the Sawmill server, we recommend 2 GB of RAM per core for large datasets.
To estimate the amount of processing power you need, start with the assumption that Sawmill Enterprise processes 2000 log lines per second, per processor core for Intel or AMD processors; or 1000 lines per second for SPARC or other processors.
Note: This is a conservative assumption; Sawmill can be much faster than this on some datasets reaching speeds of 10,000-20,000 lines per second per core in some cases. However for sizing your processor, it is best to use a conservative estimate to ensure that the specified system is sufficient.
Compute the number of lines in your daily dataset, 200 bytes per line is a good estimate. This will tell you how many seconds Sawmill will require to build the database. Convert that to hours, if it is more than six hours you will need more than one processor. You should have enough processors that when you divide the number of hours by the number of processors, it is less than 6.
Example for 50 Gigabytes (GB) of uncompressed log data per day
Calculate number of log lines to process per day 53,687,091,200 / 200 bytes per line = ~268 million lines of log data Calculate processing time in seconds 268,000,000 log lines / 2000 log lines per second = 134,000 seconds Calculate processing time in hours 134,000 / 3600 = 37.2 hours Calculate the number of processors assuming 6 hours 1 processing time per day 37.2 hours / 6 hours = 6.2 processors
1 The use of six hours is based upon the assumption that you don't want to spend more than six hours per night updating your database to add the latest data. A six hour nightly build time is a good starting point. It provides some flexibility to modify or tune the database and filters that can slow down processing and keep within the processing time available each day. The dataset above could be processed in 9 hours using four processors; if a 9 hour nightly build time is acceptable.
Pre-Sales support is strongly recommended for large datasets
If you are considering Sawmill for a very large dataset (more than 200 million lines of data), it is recommended that you contact Sawmill technical support in advance, to get expert guidance with your profile configuration. There is no charge for pre-sales technical consultations, and it is very likely to improve your initial experience of Sawmill.
Contact firstname.lastname@example.org for pre-sales support.