The "internal" database is not nearly as efficient in its use of memory as MySQL. Or, to put it another way, the "internal" database aims for performance above all, and mostly does it by keeping everything in memory. This means that it does not scale as well to extremely large datasets, or extremely large reports.This isn't a particularly large dataset, but it may well be an extremely large report. There are probably about 10 million lines in that dataset, and if it's tracking the full URLs in the "page" field, there could easily be several million unique URLs in that file. Let's say the final table is 2 million lines. Sawmill's internal representation of the table (which is keeps in memory) uses about 200 bytes per cell. If there are five columns and 2 million rows, then that's about 2GB of memory used to represent that table, which will exceed the capabilities of a 32-bit system (usually, the per-process memory limit is 2G, even on a 4G system). The vast majority of tables don't have this problem because they aren't so huge, but this one is.
MySQL keeps its tables on disk, so it doesn't use much memory at all. Even with MySQL, Sawmill keeps a representation of the table in memory, so it's still going to use a lot of RAM, but most of it will be offloaded to the SQL server's disk, which makes it *much* more scalable than using the internal database.
We will improve memory usage in the next major release of Sawmill (which we're calling 7.2 internally, though it may be 8.0 when it ships); the current build uses disk for almost everything, instead of memory, which greatly reduces the memory usage of the internal database, and somewhat reduces memory usage of MySQL. There is a 7.2 pre-release available now from http://sawmill.net/prerelease.html , but it's very much under development, has a number of known bugs, and I wouldn't trust it a bit. You might try it in a couple weeks, though (maybe ask here first).
For now, use MySQL, or use a 64-bit system with a lot of RAM (the 2GB per-process memory restriction is lifted on 64-bit platforms).
Alternately, disable that report and use Pages/directories instead. That report is hierarchical, so it won't show all 2 million items; it will usually show just a few hundred or thousand per page, and you can zoom in to see more of them. So all the information is there, but it's split into a directory structure, which keeps memory usage down.
-
Greg Ferrar, Sawmill Product Manager
support@sawmill.net