db = { parenthesizedomitted = { label = "Parenthesized Items Omitted" question = "There's a line above some of the tables in the statistics that says, \"parenthesized items omitted.\" What does that mean?" short_answer = "It means that some items (probably useless ones) have been omitted from the table to make the information more useful--you can show them by choosing \"show parenthesized items\" from the Options menu." long_answer = "

$PRODUCT_NAME omits parenthesized items (i.e. any item that starts with \"(\" and ends with \")\" from some tables to make the information more useful. For instance, most hits on a web site do not come directly from a search engine, (some come from links in other pages on the site, and others come from links on web sites that are not search engines), so usually the largest item in the search engines table would be on the item called \"(no search engine).\" Because hits from non-search-engines are not important in the search engines table, and because they dominate the numbers, making it difficult to compare \"real\" search engines, this item is omitted by default from the table. The way $PRODUCT_NAME omits it is by omitting all parenthesized items. Other examples of parenthesized items include the \"(no search terms)\" item in the search terms table, and the \"(internal referrer)\" item in the referrers table.

If you want to see all the hits in these tables, you can turn on parenthesized items in the Table Options page.

" } no_referrer_reports = { label = "Referrer Reports Missing" question = "My log data contains referrer information, but I don't see referrer reports, or search engines, or search phrases. Why not?" short_answer = "$PRODUCT_NAME includes referrer reports if the beginning of the log data includes referrers. If your log data starts without referrers, and adds it later, you won't see referrer reports. Create a new profile from the latest log file (with referrers), and change the log source to include all log data." long_answer = "

When a profile is created, $PRODUCT_NAME looks at the first few lines of the log data when determining which fields are present, and which reports to generate. If it sees a referrer field there, it will create a Referrer report, and Search Engines and Search Phrases reports, and other referrer-related reports.

This can be a problem if the log data does not contain referrer data at the beginning of the dataset. For instance, IIS often default to minimal logging (without referrers), and Apache often defaults to logging in Common Access Log Format (without referrers). If you later reconfigure the server to log referrers, $PRODUCT_NAME still won't know that, because the beginning of the log data does not contain referrers, and that's where it looks. So a profile created from the whole dataset will not report referrers, even though the later data contains referrer information.

The solution is to recreate the profile, and when it asks you where the log data is, point it to the most recent file. That file will certainly have referrer information at the beginning, so the referrer reports will be set up properly. After creating the profile, and before viewing reports or rebuilding the database, go to the Config for the profile and change the Log Source to include all your log data. Then view reports, and referrer reports will be included.

" } internalreferrers = { label = "Eliminating Internal Referrers" question = "Most of the referrers listed in the \"Top referrers\" view are from my own site. Why is that, and how can I eliminate referrers from my own site from the statistics?" short_answer = "These are \"internal referrers\"; they represent visitors going from one page of your site to another page of your site. You can eliminate them by modifying the default \"(internal referrer)\" log filter, changing http://www.mydomain.com/ in that filter to your web site URL." long_answer = "

Referrers show which page a hit came from -- i.e. they show what page a visitor was on when they clicked the link that took them to your page. For most web sites, visitors arrive and then click through several pages before leaving, so most web log data has a lot of referrers that are pages on the site being analyzed. For instance, if someone visits http://www.yoursite.com/index.html, and then clicks on a link pointing to http://www.yoursite.com/page2.html, it will show up in the log data (and in the statistics) as a referrer http://www.yoursite.com/index.html. These referrers are called an \"internal referrer,\" and under normal circumstances, you don't really care about them-- what you really want to know is which referrers brought traffic to your site, not what the referrers were once they got there.

$PRODUCT_NAME can't distinguish internal referrers from external referrers because it doesn't know your site's URL. So it doesn't know if a referral from http://www.yoursite.com/index.html is internal (which it is if your site is yoursite.com), or external (which it is if your site is anything else). To help $PRODUCT_NAME identify and hide internal referrers, you need to modify a log filter that $PRODUCT_NAME creates for you. Here's how:

  1. Go to the Config section of your profile.

  2. Click Log Filters.

  3. Edit the log filter which sets referrers from \"yoursite.com\" to \"(internal referrer)\"

  4. Replace \"yoursite.com\" with your actual site name, in that log filter.

  5. Rebuild the database.

Once you've done that, the internal referrers will be suppressed in the \"Top referrers\" view (or they will appear as \"(internal referrer)\" if you've turned on parenthesized items).

" } trialdifference = { label = "Difference Between Trial and Full" question = "What's the difference between the full version of $PRODUCT_NAME and the Trial version?" short_answer = "The Trial version is identical to the full version, except that it expires after 30 days." long_answer = "

$PRODUCT_NAME Trial is a free trial version, intended to let you evaluate the program without having to buy it. It is identical to the full version, except that it expires 30 days after it is first used. After the trial period is over, the trial version will no longer work, but it can be unlocked by purchasing a license, and all settings, profiles, and databases will remain intact." } unlocktrial = { label = "Unlocking a Trial Installation" question = "When I purchase, do I have to download a new version of $PRODUCT_NAME, or can I \"unlock\" my existing trial installation?" short_answer = "You can unlock your trial installation by entering your license key in the Licensing page." long_answer = "You don't have to download again. When you purchase, you get a license key by email. You can enter that key into the Licensing page (which you can get to by clicking Licensing on the Administrative menu) to unlock a trial installation, converting it into a fully licensed installation." } logfiles = { label = "What is a Log File?" question = "What is a log file?" short_answer = "Log files are text files created by your server, recording each hit on your site. $PRODUCT_NAME generates its statistics by analyzing log files." long_answer = "

Log files are large, ugly text files generated by web servers, proxy server, ftp servers, and just about every other kind of server. Every time something happens on the server (it serves a file, or delivers a message, or someone logs in, or something else), the server logs that information to the file, which continues to grow as new events occur. Log files are not particularly human-readable, and do not generally contain summarizing information, which is why $PRODUCT_NAME exists -- $PRODUCT_NAME processes your log files, summarizes them and analyzes them in many ways, and reports it back to you in a much friendlier format-- graphs, tables, etc.

You need to have access to your log files to use $(PRODUCT_NAME). If you don't have log files, $PRODUCT_NAME can't do anything for you. If you don't know where your log files are, ask your server administrator (hint: they are often stored in a directory called \"logs\"). In some cases, servers are configured so they do not keep log files, or the logs are hidden from users; in these situations, you will not be able to use $(PRODUCT_NAME). Again, your server administrator can help you find your log files, or they can tell you why they're not available. If you're trying to analyze a web site, and your ISP does not provide logs for you, you may want to consider switching to one that does.

" } platforms = { label = "Available Platforms" question = "What platforms does $PRODUCT_NAME run on?" short_answer = "Microsoft Windows 7/8/Vista/XP/2003/2008/2012, Mac OS X, most versions and variants of UNIX." long_answer = "$PRODUCT_NAME runs on Microsoft Windows Server 2003, Windows Server 2008, Windows 2012, Windows XP, Windows Vista, Windows 7, Windows 8, Mac OS X and most popular flavors of UNIX (Linux, Solaris, FreeBSD, OpenBSD, NetBSD, BSD/OS, Tru64 UNIX (Digital Unix), IRIX, HP/UX, AIX, OS/2, and BeOS). It is expected to also remain compatible with future versions of Windows and Mac OS X. Binary versions are available for the most popular platforms; on less common platforms, it may be necessary to build $PRODUCT_NAME yourself from the source code (which is available for download in encrypted/obfuscated format).

That's just the server; once you have the server running, you can configure $PRODUCT_NAME, generate reports, and browse reports from any computer, using a normal web browser.

" } systemrequirements = { label = "System Requirements" question = "How much memory, CPU power, and disk space do I need to run $PRODUCT_NAME?" short_answer = "At least 2GB RAM, 4 GB preferred; 500 MB disk space for an average database; and as much CPU power as you can get." long_answer = "

$PRODUCT_NAME is a heavy-duty number crunching program, and can use large amounts of memory, CPU, and disk. You have some control over how much it uses of each, but it still requires a reasonably powerful computer to operate properly.

$PRODUCT_NAME uses around 100 MB of memory when it processes a small to medium size log file, and it can use considerably more for very large log files. The main memory usage factors are the \"item lists\", which are tables containing all the values for a particular field. If you have a field in your data, which is very complex, and has many unique values (the URL query field for web log data is a common example of this), the item list can be very large, requiring hundreds of megabytes of memory. This memory is mapped to disk to minimize physical RAM usage, but still contributes to the total virtual memory usage by $PRODUCT_NAME. So for database with very complex fields, large amounts of RAM will be required. For large datasets, it is possible for $PRODUCT_NAME to use more than 2GB of address space, exceeding the capabilities of a 32-bit system; in this situation, it is necessary to use a 64-bit system, or a MySQL database, or both (see {=docs_faq_link('dbmemory')=} and {=docs_faq_link('memoryusage')=}). This typically will not occur with a dataset smaller than 10 GB, and if it often possible to process a much larger dataset on a 32-bit system with 2GB. A dataset over 20 GB will often run across this issue, however, so a 64-bit system is recommended for very large datasets. A large dataset is defined as 10 GB or more. A multi-core 64-bit CPU coupled with a 64-bit operating system and at least 2 GB RAM PER CORE (e.g. 8 GB recommended for a 4-core system) is highly recommended, if not required for datasets larger than 10 GB of log data. If your system cannot support the RAM usage required by your dataset, you may need to use log filters to simplify the complex database fields.

The $PRODUCT_NAME installation itself takes less than 50 Meg of disk space, but the database it creates can take much more. A small database may be only a couple megabytes, but if you process a large amount of log data, or turn on a lot of cross-references and ask for a lot of detail, there's no limit to how large the database can get. In general, the database will be somewhere on the order of 200% to 300% the size of the uncompressed log data in it, perhaps as much as 400% in some cases. So if you're processing 100 GB of log data, you should have 200 GB to 400 GB of disk space free on your reporting system to hold the database. If you use an external (e.g. SQL) database, the database information will take very little space on the reporting system, but will take a comparable amount of space on the database server.

Disk speed is something else to consider also when designing a system to run $(PRODUCT_NAME). During log processing, $PRODUCT_NAME makes frequent use of the disk, and during statistics viewing it uses it even more. Many large memory buffers are mapped to disk, so a disk speed can have a very large impact on database performance, both for processing log data and querying the database. A fast disk will increase $PRODUCT_NAME's log processing time, and the responsiveness of the statistics. SCSI is better than IDE, and SCSI RAID is best of all.

During log processing, especially while building cross-reference tables, the CPU is usually the bottleneck -- $PRODUCT_NAME's number crunching takes more time than any other aspect of log processing, so the rest of the system ends up waiting on the CPU most of the time. This means that any improvement in CPU speed will result in a direct improvement in log processing speed. $PRODUCT_NAME can run on any system, but the more CPU power you can give it, the better. Large CPU caches also significantly boost $PRODUCT_NAME's performance, by a factor of 2x or 3x in some cases.

" } dbmemory = { label = "Database Memory Usage" question = "I get an error 'Unable to allocate N bytes of memory' while building a database, and $PRODUCT_NAME seems to have used all my available memory. What can I do about it?" short_answer = "Use a 64-bit computer and operating system with sufficient RAM, and/or simplify your database" long_answer = `

This error means that $PRODUCT_NAME tried to allocate another chunk of memory (N additional bytes, on top of whatever it was already using), and the operating system told it that there was no more memory available for it to use. This error is usually not a bug; it almost always indicated that $PRODUCT_NAME really has exhausted all memory available. This error typically happens when using the "internal" database with a very large dataset.

The "internal" database is optimized for performance above all, and tends to keep some key data structures in memory. On 32-bit systems, when processing large datasets, the amount of memory required may exceed the available address space. Typically, the internal database will work well up to about 10 GB of uncompressed log data on a 32-bit system. Above that, scalability may become an issue. On 64-bit systems, the address space is not a concern, but if there is not sufficient physical RAM, this error can still occur.

Itemnum tables, especially, can result in heavy memory usage for large datasets. Itemnum tables, or normalization tables, are typically kept in memory. $PRODUCT_NAME keeps a list of all values seen for each field, e.g., a list of all IP addresses which appear in a particular field, or a list of all URLs which appear in another field, in the "itemnum" tables. These tables are kept in memory, or at least mapped to memory, so they use available memory addressing space. In the case of an IP address field, for instance the source IP address of a web server log, each value is about ten bytes long. If there are 10 million unique IPs accessing the site, this table is 100 million bytes long, or 100 MB. Similarly for a proxy log analysis, if each unique URL is 100 bytes long and there are 10 million unique URLs in the log data, the table will be 1 GB. Tables this large can easily exceed the capabilities of a 32-bit system, which typically allows only 2 GB of memory to be used per process.

One solution is to use a 64-bit system and operating system, with sufficent RAM; with a 64-bit processor, $PRODUCT_NAME will be able to allocate as much RAM as it needs, provided the RAM is available on the system (and it can use virtual memory if it isn't). This is the most complete solution; with a large amount of RAM on a 64-bit system, it should be possible to build extraordinarily huge databases without running out of memory.

Another option is to simplify the dataset; see {=docs_chapter_link('resources')=} for suggestions. In particular, adding a lot filter to simply or eliminate very complex database fields can not only reduce memory usage, but also improve performance.

For an estimate of how much RAM you may need, see {=docs_chapter_link('server_sizing')=}.

` } winsock2 = { label = "Winsock 2" question = "When I run $PRODUCT_NAME on Windows, I get an error: \"A required DLL is missing: WS2_32.DLL.\" What's going on?" short_answer = "You need Winsock 2." long_answer = "To run on Windows 95, and some early versions of Windows 98, $PRODUCT_NAME requires Winsock2, a networking component available for free from Microsoft. You can download Winsock2 from here.

Winsock2 is already part of Windows 98 (newer versions), Windows NT 4.0, and Windows 2000, so you do not need to download this component unless you are using Windows 95 or an old version of Windows 98.

" } libstdcppmissing = { label = "libstdc++ missing" question = "When I run $PRODUCT_NAME, I get an error: './sawmill: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory'. What's going on?" short_answer = "$PRODUCT_NAME requires the libstdc++ library. This is available by default on many platforms, and is included in the $PRODUCT_NAME distribution on others (including Solaris)" long_answer = "

$PRODUCT_NAME requires the libstdc++ library. This is available by default on many platforms, but it is not available on some older platforms, and it is often not available on Solaris. There are several ways of making this available:

" } oleaccdll = { label = "Missing DLL: OLEACC.DLL" question = "When I run $PRODUCT_NAME on Windows 98, I get an error: \"A required DLL is missing: OLEACC.DLL.\" What's going on?" short_answer = "You need to download and install the latest Service Pack for Windows 98." long_answer = "$PRODUCT_NAME requires a DLL called OLEACC.DLL. This DLL is part of recent versions of Windows 98, but it is not part of older versions of Windows 98. If you're running an older Windows 98, you'll need to install the latest Service Pack before you can run $(PRODUCT_NAME). The service pack is a free download from Microsoft.

" } urlmondll = { label = "Missing DLL: URLMON.DLL" question = "When I run $PRODUCT_NAME on Windows, I get an error: \"A required DLL is missing: URLMON.DLL.\" What's going on?" short_answer = "Install the latest Internet Explorer, and the problem should go away." long_answer = "

This DLL is part of Microsoft Internet Explorer. It is also included in many recent versions of Windows. If you see this error, download and install the latest Internet Explorer, and the problem should go away.

" } dnsproblems = { label = "Problems With DNS Lookup" question = "$PRODUCT_NAME only shows me the IP addresses of my visitors, even when I turn on DNS lookup. Why?" short_answer = "Try deleting the IPNumbersCache file in LogAnalysisInfo -- see the long answer for other solutions." long_answer = "

(See {=docs_faq_link('dnslookup') =} for information about reverse DNS lookup).

Usually, this occurs because the DNS server can't resolve the IPs. The DNS server you're using needs to know about the IPs you're resolving. For instance, you can't use an external DNS server to resolve internal IP addresses, unless the external DNS server knows about them. Try using an internal DNS server, or another DNS server, if the first DNS server you try can't seem to resolve the IPs. It's useful to manually query the DNS server to see if it can resolve a particular IP; on most operating systems, this can be done with the \"dnslookup\" command.

" } noimagescgi = { label = "No Images in CGI Mode" question = "I run $PRODUCT_NAME in CGI mode, and all the images in the menus and the reports are missing or broken. Why?" short_answer = "You may have set the \"temporary $lang_stats.directory\" incorrectly during installation. Try deleting the preferences.cfg file in LogAnalysisInfo, and access $PRODUCT_NAME to try again." long_answer = "

When $PRODUCT_NAME runs as a CGI program, it includes images in its pages by creating them in a temporary $lang_stats.directory in the web server $lang_stats.directory, and then embedding links in the HTML so that the images it created are served by the web server. This is done by selecting a \"temporary $lang_stats.directory\" and \"temporary $lang_stats.directory URL\" which point to a $lang_stats.directory inside the web server's root $(lang_stats.directory). They both point at the same $lang_stats.directory, but one of them is the pathname of the $lang_stats.directory, and one of them is the URL of the $(lang_stats.directory). These two must point at the same $lang_stats.directory for images to appear in the pages generated by $PRODUCT_NAME in CGI mode. If images are not appearing, it is usually because this is set incorrectly.

To correct the temporary $lang_stats.directory, delete the preferences.cfg file in the LogAnalysisInfo folder, and access $(PRODUCT_NAME). You will be prompted to enter the pathname and URL of the the temporary $(lang_stats.directory). Make sure you see the logo on the page after you enter the temporary $lang_stats.directory -- if the logo does not appear, click your browser's Back button and try again until you see the logo. If the logo does not appear, no other images in the $PRODUCT_NAME interface will either.

" } cantaccessserver = { label = "Can't Access the Server" question = "When I run $PRODUCT_NAME, it tells me that the server is started (it shows me the URL), but when I try to access that URL, the browser says it's not available. How can I fix this?" short_answer = "You may be using a proxy server which prevents you from accessing a server running on your own machine. Try reconfiguring the proxy to allow it, or try running $PRODUCT_NAME on IP 127.0.0.1 (the loopback interface)." long_answer = "

If you're running Windows 2003 and using Internet Explorer, look at {=docs_faq_link('w2003_ie_lockdown')=} first, and return here if that doesn't help.

When you first start $PRODUCT_NAME in web server mode, it tries to start a web server, running on the local machine, using port 8988. If this fails, it should give you an error message; if it succeed, it should give you a URL. If you're seeing a URL when you start $PRODUCT_NAME, it generally means that the $PRODUCT_NAME server started successfully, and is ready to answer web browser requests.

Sometimes, though, when you actually try to access that URL, you may find that the server doesn't answer. Your browser may tell you that there's a DNS error, or that it couldn't contact the server, or that there's some other kind of error. If $PRODUCT_NAME displayed a URL, the server itself is probably working fine-- the problem is not with the server, but with the network connection to the server. This can happen, for instance, if you're using a web server proxy or cache server, and it doesn't know about the IP address of your own machine. When you contact the cache and ask to connect to your own machine, it gets confused, because normal web requests come from inside machines contacting outside machines, and this one is an inside machine contacting another inside machine (itself). A well-configured proxy server can handle this, but one that is not configured to handle internal requests may attempt to get the URL from the outside, and may give an error when it doesn't find it there. Some proxies/caches/firewalls will also refuse to let through traffic on port 8988 ($PRODUCT_NAME's default port), regardless of other settings.

There are several solutions. One choice is to reconfigure the proxy or cache server to allow HTTP connections from internal machines to other internal machines, on port 8988. Then $PRODUCT_NAME will be able to operate in its preferred mode, on port 8988 of the machine's first IP address.

If that's not an option, you may be able to get $PRODUCT_NAME to work by running it on the loopback interface (IP 127.0.0.1), or on port 80 (the standard web server port). The easiest way to find a working solution is to use the command-line interface to $PRODUCT_NAME, at least until you have it working; you can go back to using the graphical version later. From the command line, run $PRODUCT_NAME like this:

  $PRODUCT_EXECUTABLE -ws t -sh 127.0.0.1 -wsp 80

This will attempt to start $PRODUCT_NAME's web server on IP 127.0.0.1 (the loopback interface), using port 80. This will only work if there is not a web server already running on the system-- only one server can use port 80 at a time. If you already have a web server running, use port 8988 instead. Try the command above with different IP addresses (127.0.0.1, and any IP addresses you know belong to your computer), and different ports (try 8988 first, then 80). With a little luck one of the choices will start a server that you can connect to. Once you've got the $PRODUCT_NAME interface working in your web browser, you can set it to use that IP and port permanently in the Preferences, from the Administrative Menu. Once you've set the IP and port in the Preferences, you can quit the command-line $PRODUCT_NAME, and start using the graphical version, if you prefer.

If that still doesn't work, check if there is a firewall on your system or on your network, which is blocking traffic from your machine to itself, on port 8988. If there is, try disabling the firewall temporarily (or reconfigure it to allow the traffic), and see if it works then. If it works with the firewall disabled, and doesn't work with the firewall enabled, then the firewall is probably blocking the necessary traffic. You'll probably want to reconfigure the firewall to let the network traffic through on 8988.

If none of these work, and you have a web server running on your system there is always CGI mode. $PRODUCT_NAME can run under any running web server in CGI mode; if you can connect to the web server itself, you'll be able to use $PRODUCT_NAME by running $PRODUCT_NAME under your local server as a CGI program.

Finally, if you can't get $PRODUCT_NAME to work to your satisfaction, please contact $SUPPORT_EMAIL.

" } loginloop = { label = "Login Loops Back to Login" question = "When I try to log in to $PRODUCT_NAME, I get to the Admin page, but the next thing I click takes me back to the login page. Why?" short_answer = "Your browser isn't storing the cookie $PRODUCT_NAME needs to maintain the login, or something is blocking the browser from sending the cookie. Make sure cookies are on in the browser, firewalls aren't blocking cookies, and don't use Safari 1.2.1 or earlier as your browser." long_answer = `

$PRODUCT_NAME uses web browser cookies to store your login information, which keeps you logged in. If the browser isn't passing the cookie back to $PRODUCT_NAME properly, $PRODUCT_NAME won't know you're logged in, and you'll keep getting the login screen.

To keep this from happening, make sure cookies are enabled in your web browser. If you want to be selective about who gets cookies, at least make sure that the hostname or IP where $PRODUCT_NAME is running is allowed to get cookies. If your browser differentiates "session cookies" from other cookies, all you need is session cookies.

Use an approved browser--some browsers don't handle cookies quite right. Approved browsers are Internet Explorer 6, Safari 1.2.2 or later, and Firefox. Others may work, but have not been verified. In particular Safari 1.2.1 and earlier does not handle cookies properly -- this is fixed in 1.2.2 and later.

` } commandlinelogsource = { label = "Using a Command-line Log Source" question = "Can $PRODUCT_NAME use scp, or sftp, or ssh, or https, to download log data? Can it uncompress tar, or arc, or sea, or hqx, etc.?" short_answer = "Not directly, but you can do it by using a command-line log source to run a command line, script, or program that does whatever is necessary to fetch the data, and prints it to $PRODUCT_NAME." long_answer = "

$PRODUCT_NAME supports many different methods of acquiring log data, including direct access to local files, and FTP or HTTP access to remote files; it can also decompress the major compression formats on the fly, including zip, gzip, and bzip2. If you need to use a different method to fetch the log data, like scp, sftp, or ssh, or if you need to read the log data from a database, or if you need to uncompress, decode, or decrypt a format that is not directly supported by $PRODUCT_NAME, you can do it using a command-line log source.

Command-line log sources are very simple in concept. You give $PRODUCT_NAME a command line; it runs the command line whenever it needs to get the log data; the command, script or program you specify \"prints\: the log data (i.e. generates it to stdout, the standard command line output stream), and $PRODUCT_NAME reads the output of the command to get the log data. The provides you with unlimited flexibility in how you feed your data to $PRODUCT_NAME.

For instance, suppose $PRODUCT_NAME didn't support gzip for at (it does). Then you could use the following (UNIX) command log source: /bin/gunzip -c /logs/mylog.gz. Since the -c flag tells gunzip to dump the output to stdout, $PRODUCT_NAME will read the log data directly from this command, without needing to use its built-in gunzipper. More usefully, any decompression utility with a similar flag can be used to allow $PRODUCT_NAME to read any compressed, archived, or encrypted log directly, even if it doesn't know anything about the format.

Even if you don't have a program that will dump the data to stdout, you can still use this approach by writing a tiny script. Consider the following (UNIX) shell script which scp'd files from a remote server and feeds them to $PRODUCT_NAME:

  scp user@host:/logs/mylog.txt /tmp/templog
  cat /tmp/templog
  rm /tmp/templog

This script copies a log file from a remote machine (securely, using scp), prints it to stdout using \"cat\", and deletes it when it's done. The same script with slight modifications, could copy multiple files, or use a different method than scp to fetch the files (like sftp).

A simpler (and better) example which does the same thing is this command:

  scp -qC user@host:/logs/mylog.txt > /dev/stdout

This explicitly scps the files to stdout, which sends them straight into $PRODUCT_NAME without the intermediate step of being stored on the disk or deleted. Since it's just one line, there's no need to use a script at all; this single line can be the command for the log source.

" } memoryusage = { label = "$PRODUCT_NAME uses too much memory for builds/updates, and is slow to view" question = "When I build or update my database with $PRODUCT_NAME, it uses a huge amount of memory. Then, when I view statistics, it's very slow. What can I do about that?" short_answer = "Decrease the complexity of the database." long_answer = "

The main portion of the database that uses memory are the \"item lists\". There is one list for each database field, and each list contains all the unique values for that field. If one of the fields in your database has many unique values, (millions) it can require a very large amount of memory to track. Simplifying the field can save memory.

To check which database field is the main culprit, look at the sizes of the files in the \"items\" sub$lang_stats.directory, in the database $lang_stats.directory (in the Databases $lang_stats.directory of the LogAnalysisInfo $lang_stats.directory). For instance, if location $lang_stats.directory is the largest, at 500 Meg, then you know that the \"location\" database field is responsible for the largest part of the memory usage.

When you've found the culprit, you need to reduce its memory usage. This is where you'll have to make compromises and cuts. The simplest solution is to delete the database field, and stop tracking and reporting on it. If that's not an option, you'll need to simplify the field in some way. The key point here is that you are trying to reduce the number of unique field values that $PRODUCT_NAME sees and tracks. The pool file, which is usually the largest one, contains a back-to-back list of all all field values that are used in the database; if you can reduce the number of possible field values used by $PRODUCT_NAME, you will reduce the size of the file.

If the field is a hierarchical (like a pathname, hostname, date/time, or URL), you can simplify it by tracking fewer levels, by adjusting the suppress_top and suppress_bottom values in the database.fields section of the profile .cfg file (in the profiles folder of the LogAnalysisInfo folder). For instance, the page field of web logs is tracked nine directories deep by default; you can simplify it by tracking only the top three levels directories. If your date/time field is set to track information to the level of minutes, you can change it back to tracking hours or days only. Usually, you will want to turn off bottom-level items checkbox for the field, since it's usually the bottom level that has all the detail.

Another possibility is to use a Log Filter to simplify the field. The default filter for web logs which replaces everything after ? with \"(parameters)\" is an example of this. By replacing all the various parameterized versions of a URL with a single version, this filter dramatically decreases the number of different page field values that $PRODUCT_NAME sees, therefore dramatically decreasing the memory usage of the \"page\" field. Similarly, if you have a very complex section of your directory structure, but you don't really need to know all the details, you can use a Log Filter to delete the details from your field, collapsing the entire structure into a few items.

A common source of high memory usage is a fully-tracked hostname/IP field. By default, $PRODUCT_NAME tracks only the first two levels of hostnames for web and proxy logs; i.e. it will tell you that a hit came from .sawmill.net, but not that it came from some.maching.sawmill.net. Because of the tremendous number of IP addresses that appear in large log files, this field can be a problem if it's set to track individual IPs (there's a checkmark that lets you do this when you create the profile). If this is happening, consider tracking only a few levels of the hostname hierarchy, instead of the the full IP address.

Of course, sometimes you really need the full detail you're tracking in a very large field. If you can't reduce the detail, and you can't reduce the amount of log data, then the only solution is to get enough memory and processing power to efficiently handle the data you're asking $PRODUCT_NAME to track.

" } iiscgitimeout = { label = "IIS CGI Timeout" question = "When I run $PRODUCT_NAME as a CGI program under IIS, I get an error message \"CGI Timeout: The specified CGI application exceeded the allowed time for processing. The server has deleted the process.\" What can I do about that?" short_answer = "Set the IIS CGI timeout to a high value, like 999999." long_answer = "

Microsoft Internet Information Server (IIS) automatically terminates CGI programs that run for more than five minutes. Unfortunately, $PRODUCT_NAME can easily use that much when building a database, and if IIS terminates it, it may leave the database partly built and unusable. The solution is to reconfigure the IIS server to increase the CGI timeout to a much larger value. Here's how (instructions are for Windows 2000 Server; other Windows variants may be slightly different):

  1. In the Start Menu, go the Settings menu, and choose Control Panels.

  2. Open the Administrative Tools control panel.

  3. Open the Internet Services Manager item.

  4. Right-click on the computer icon in the left panel and choose Properties from the menu that appears.

  5. Click \"Edit...\" next to \"WWW Services\".

  6. Click the \"Home Directory\" tab.

  7. Click the \"Profile...\" button.

  8. Click the \"Process Options\" tab.

  9. Enter a large value in the CGI script timeout field, perhaps 999999.

" } differentip = { label = "Running on a Different IP" question = "I'm running $PRODUCT_NAME on Windows, and it automatically starts itself up on IP 127.0.0.1 and port 8988. How can I tell it to use another IP address and port?" short_answer = "Set the Server Hostname option and the Web Server Port option in the Network section of the Preferences." long_answer = "

By default, $PRODUCT_NAME binds to all available IPs, so if there's an IP address where it is allowed to listen on port 8988, it already is (it's also listening on 127.0.0.1).

If you want it to listen only on the IP you specifiy you can do it from the Preferences. Go to the Preferences, click on the Network category, change the \"Server hostname\" option to the IP address you want to use, and change the \"Web server port\" option to the port number you want to use. The next time you start $PRODUCT_NAME, it will automatically bind to the IP address you specified.

If you're using the command-line version of $PRODUCT_NAME ($PRODUCT_EXECUTABLE_DOCS), you can either do the same as above, or you can give $PRODUCT_NAME command line options to tell it which IP number and port to use:

  $PRODUCT_EXECUTABLE_DOCS -ws t -sh 128.129.130.131 -wsp 8888

When you use these options, $PRODUCT_NAME will immediately start up its web server on the port you specify.

" } windowsservice = { label = "Running $PRODUCT_NAME as a Service" question = "Can I run $PRODUCT_NAME as a Service on Windows? Can I run $PRODUCT_NAME while I'm logged out?" short_answer = "As of version 8, $PRODUCT_NAME is installed as a service when you run the normal installer." long_answer = "Earlier versions of $PRODUCT_NAME required extra steps to run them as a service, but this is no longer a problem-- the normal Windows installer automatically installs $PRODUCT_NAME as a service when you run it." } remoteadmin = { label = "Remote Administration" question = "My web site is hosted in another state. Does $PRODUCT_NAME provide browser based admin tools I can use to configure it and retrieve reports?" short_answer = "Yes, $PRODUCT_NAME's interface is entirely browser based." long_answer = "

$PRODUCT_NAME's interface is entirely web browser based. $PRODUCT_NAME runs either as a stand-alone program (in which case it uses its own built-in web server to serve its interface), or as a CGI program (in which case it uses the normal web server on the machine). In either case, $PRODUCT_NAME is configured by running a web browser on any machine you choose, and accessing $PRODUCT_NAME as though it were a web site. Statistics are also served through a web browser interface. You do not need to be physically present at the server to configure it or to view reports; all you need is a web browser.

" } resettrial = { label = "Resetting the Trial Period" question = "My 30-day trial has expired, and I haven't finished evaluating $PRODUCT_NAME yet. How can I get a new trial?" short_answer = "Go to the Licensing page, delete your expired license, and click \"Try $PRODUCT_NAME For 30 Days.\"" long_answer = "

$PRODUCT_NAME's trial license allows you to use it for evaluation purposes only. However, if after 30 days you still have not had a chance to fully evaluate $PRODUCT_NAME, you can extend your trial for another 30 days by doing the following:

  1. Go to the Licensing page.

  2. Delete your current trial license.

  3. Click the \"Try $PRODUCT_NAME for 30 Days\" button.

This will work only once -- after that, you will need to contact us at $SUPPORT_EMAIL if you want to extend your trial period further.

" } resetpassword = { label = "Resetting the Administrative Password" question = "I've forgotten the password I chose for $PRODUCT_NAME when I first installed; how can I reset it?" short_answer = "As of version 8.0.2, there is a custom action reset_root_admin." long_answer = "

For security reasons, $PRODUCT_NAME requires an administrative username and password whenever you use it (otherwise, anyone could use it to access your computer, since $PRODUCT_NAME is normally accessible by anyone on your network). You choose this username and password when you first run $PRODUCT_NAME, and it asks you for it whenever you run it again.

In version 7 we simply deleted users.cfg and prompted for a new root admin username and password. Though this is very insecure in a multi-user environment when the Root Admin deletes users.cfg but delays to enter a new username and password for hours or days. In such a case every other user who tried to access $PRODUCT_NAME would be prompted to enter a new root admin username and password and would gain root admin access when doing so.

In version 8, as of 8.0.2, there is now a custom action, reset_root_admin. This is run from the command line like this:

$PRODUCT_EXECUTABLE_DOCS -a rra -u username -pw password

This command changes the root username and password to the values specified for username and password.

E.g., on Windows, from the Command Prompt:

c:\\
cd c:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8
$PRODUCT_EXECUTABLE_DOCS_WIN32 -a rra -u jane -pw mypassword

or on Macintosh or Linux/Unix, from the terminal (assuming $PRODUCT_NAME is installed in /Applications/$PRODUCT_NAME):

cd '/Applications/$PRODUCT_NAME'
./$PRODUCT_EXECUTABLE_DOCS -a rra -u jane -pw mypassword

This is even more secure than using a default/default users.cfg, because there is no longer even the possibility of an attacker repeatedly trying default/default in the hope of catching $PRODUCT_NAME between steps 2 and 4 of the original approach (below). The custom action approach also solves the problem of losing other users (and the root admin language), because nothing is changed in users.cfg other than the root admin username and password.

This action exists only in 8.0.2 or later. For users with 8.0.0, and you forgot the username or password you originally chose, you can reset your password but you must contact $PRODUCT_NAME support and we will give you a file to be placed in lang_stats.directory. This will delete all users from $PRODUCT_NAME. Once you have the new users.cfg, access $PRODUCT_NAME again through a web browser, and you will be prompted to choose a new administrative username and password.

" } commandlinebuild = { label = "Building a Database from the Command Line" question = "How do I build a database from the command line?" short_answer = "Run \"executable -p profilename -a bd\" from the command line window of your operating system." long_answer = `

It is not necessary to use the web interface to build a database; you can use the command line. This is useful for debugging problems with profiles, or for building when the web interface is not available, e.g. from scripts. The exact method, and the exact command, depends on the platform; see below. See also Additional Notes For All Platforms.

Windows

To build a database from the command line, first open a command prompt window. One method to open a command prompt window (sometimes called a DOS window) is to click "start" in the windows task bar then click "run", enter "cmd" in the text box and hit return.

You will get a new window that will display something like this:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\\Documents and Settings\\username>

In the command prompt window you will need to move to the $PRODUCT_NAME installation directory using the "cd" command. $PRODUCT_NAME is installed by default to "C:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8", to move to this directory type cd C:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8 or whatever path you specified during installation.

C:\\Documents and Settings\\username >cd 'C:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8\\

C:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8>

To get a list of internal profile names type the command "$PRODUCT_EXECUTABLE_DOCS_WIN32 -a lp" at the command prompt. This will display a list of the internal profile names from which you can select the profile you want to build.

C:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8>$PRODUCT_EXECUTABLE_DOCS_WIN32 -a lp
$PRODUCT_NAME 8.0.0; Copyright (c) 2008 Flowerfire
myprofile

To build you will run $PRODUCT_NAME with the "-p profilename -a bd" options. Replace profilename with the internal name of your profile from the list of internal profile names. The build command and related output are shown below. If you wanted to update your database you can run $PRODUCT_NAME with the -p profilename a ud options.

C:\\Program Files{='\\\\' . expand('$PRODUCT_NAME')=} 8>$PRODUCT_EXECUTABLE_DOCS_WIN32 -p myprofile -a bd
$PRODUCT_NAME 8.0.0; Copyright (c) 2008 Flowerfire
 Reading log file: C:\\Apache       [                    ] 0.00%  00:00
 Reading log file: C:\\Apache       [-                   ] 3.16%  00:01
 Reading log file: C:\\Apache       [######-             ] 33.33% 5000e 00:02
 Building cross-reference table 4 (worm)             [#############       ] 66.67%  00:03
 Building cross-reference table 12 (search_engine)  [##############=     ] 73.68% 00:04
 Building cross-reference table 18 (server_response) [####################] 100.00%  00:05

Mac OS X

To build a database from the command line, first open a terminal window. On Mac, you do this by selecting the Finder, navigating to the Applications folder, Utilities, and double clicking the Terminal application.

You will get a new window that will display something like this:

Last login: Mon Sep 1 10:46:44 on ttyp1
Welcome to Darwin!
[host:~] user%

In the terminal window you will need to move to the $PRODUCT_NAME installation directory using the "cd" command. Typically $PRODUCT_NAME is located in "/Applications/$PRODUCT_NAME". If you installed $PRODUCT_NAME somewhere else, change the directory name in the command to match. To move to this directory type "cd /Applications/$PRODUCT_NAME":

[host:~] user% cd /Applications/$PRODUCT_NAME
[host:/Applications/$PRODUCT_NAME] user%

To get a list of internal profile names type the command "./$PRODUCT_EXECUTABLE_DOCS -a lp" at the command prompt. This will display a list of the internal profile names from which you can select the profile you want to build.

[host:/Applications/$PRODUCT_NAME] user% ./$PRODUCT_EXECUTABLE_DOCS -a lp
$PRODUCT_NAME 8.0.0; Copyright (c) 2008 Flowerfire
myprofile

To build you will run $PRODUCT_NAME with the "-p profilename -a bd" options. Replace profilename with the internal name of your profile from the list of internal profile names. The build command and related output are shown below. If you wanted to update your database you can run $PRODUCT_NAME with the -p profilename a ud options.

[host:/Applications/$PRODUCT_NAME] user% ./$PRODUCT_EXECUTABLE_DOCS -p myprofile -a bd
$PRODUCT_NAME 8.0.0; Copyright (c) 2008 Flowerfire
 Reading log file: /logs/Apache       [                    ] 0.00%  00:00
 Reading log file: /logs/Apache       [-                   ] 3.16%  00:01
 Reading log file: /logs/Apache       [######-             ] 33.33% 5000e 00:02
 Building cross-reference table 4 (worm)             [#############       ] 66.67%  00:03
 Building cross-reference table 12 (search_engine)  [##############=     ] 73.68% 00:04
 Building cross-reference table 18 (server_response) [####################] 100.00%  00:05

Linux/UNIX

Follow the Mac OS X instructions, which are basically UNIX instructions (since Mac OS X is basically UNIX); change the directories to match the location where you installed $PRODUCT_NAME. The executable file usually ends with the version number on Linux/UNIX platforms, so you'll need to change references from "./$PRODUCT_EXECUTABLE_DOCS" to "./$PRODUCT_EXECUTABLE_DOCS-8.0.0" (or whatever the version is).

Additional Notes For All Platforms

When the command completes, the database will be built. If there is an error, it will be displayed in the command line window.

To get debugging output from the build (not usually useful), you can set the SAWMILL_DEBUG environment variable to 1, before rebuilding the database with the command above. On Windows, you can set this variable with "set SAWMILL_DEBUG=1". On Mac or other operating systems, you can run "export SAWMILL_DEBUG=1" (if you're using the bash shell), or "setenv SAWMILL_DEBUG 1" (if you're using csh). If you're not sure which shell you're running, type them both; one will work (it will not give any response), and one will give an error, which you can ignore.

You can also use the -v option to get "verbose" output from the build. There are many -v options available, documented in the "Command-line output types" page of the technical manual ( http://www.sawmill.net/cgi-bin/sawmill8/docs/sawmill.cgi?dp=docs.option&option_name=command_line.verbose ). For very high detail (too slow for any significant build), add "-v egblpfdD" to the command line. If you add much debugging output, you may also want to add "| more" to the end of the command line to pipe the output to a pager, or to add "> out.txt" to the end of the command line to redirect the output to a file.

For more examples of command-line usage, run $PRODUCT_NAME from the command line with the --help option.

` } typicalsetup = { label = "Typical Usage Patterns" question = "How does a typical company use $PRODUCT_NAME; what does a typical $PRODUCT_NAME setup look like?" short_answer = "Installations vary from customer to customer--$PRODUCT_NAME provides enough flexibility to let you choose the model that works best for you." long_answer = "

There are quite a lot of different \"models\" that different customers use. For web server analysis, it is common to have $PRODUCT_NAME running on the active web server, either stand-alone or in web server mode, accessing the growing log files directly; this works well as long as the dataset is not too large and the server is not too heavily loaded. For very large datasets, however, many customers have dedicated $PRODUCT_NAME machines, which pull the logs over the network from the server(s). Databases are generally updated regularly; it's common to have them updated in the middle of the night, every night, using the $PRODUCT_NAME Scheduler or an external scheduler like cron.

In terms of the database layout, some common models include:

There are a lot of options, and there's no single best solution. You can try out different methods, and change them if they're not working for you. $PRODUCT_NAME provides you the flexibility to choose whatever's best for you.

" } competitivecomparison = { label = "$PRODUCT_NAME vs. The Competition" question = "How is $PRODUCT_NAME different from other log analysis tools?" short_answer = "Among other things, $PRODUCT_NAME does not generate static reports -- it generates dynamic, interlined reports." long_answer = "

There are many areas in which $PRODUCT_NAME beats the competition, but one major one is that $PRODUCT_NAME's statistics are dynamic, and its statistics pages are interlinked. Most other log analysis programs are report-based -- you specify certain criteria (like, \"give me all hits on my web site on January 14, broken down by page\") and it generates a single report, and it's done. If you want more detail about something, it's not available, or it's only available if you reprocess the log data with different settings.

$PRODUCT_NAME generates HTML reports on the fly, and it supports zooming, filtering, and many other dynamic features. You can zoom in a certain directory, for instance, and then see the events for that directory broken down by date, or by IP, or by weekday, or in any other way you like. You can create arbitrary filters, for instance to zoom in on the events for a particular address on a particular day, or to see the search terms that were used from a particular search engine on a particular day, which found a particular page. $PRODUCT_NAME lets you navigate naturally and quickly through hierarchies like URLs, pages/directories, day/month/years, machine/subnets, and others.

Of course, there are many other features that set $PRODUCT_NAME apart from the competition-- see our web site for a complete list.

" } multiplesites = { label = "Statistics for Multiple Sites" question = "Can $PRODUCT_NAME generate separate analyses for all the web sites hosted on my server?" short_answer = "Yes, $PRODUCT_NAME includes a number of features for just this purpose." long_answer = "

Absolutely. This is one of our core design goals -- to make $PRODUCT_NAME a good choice for web hosting providers, ISPs, and others who serve multiple sites from a single server. $PRODUCT_NAME's profiles provide an excellent mechanism for generating different statistics for each customer or web site. If each site has its own log file(s), this is trivial; you can just make a profile that analyzes the appropriate log file. If all sites share a single log file, it's not much harder -- $PRODUCT_NAME's advanced filtering mechanism lets you easily ignore all log entries except those of interest to a particular web site.

The technique you use depends on your situation. In general, you will need to have a separate profile for each user (you can quickly create all of your profiles using the Create/Update Many Profiles feature). For maximum flexibility, each profile can have its own database, and each profile can be password-protected or secured in some other way, to prevent unauthorized users from accessing it. See {=docs_chapter_link('security')=} for a discussion of some of the ways profiles can be secured. If each profile has its own database, then the log filters can be used to filter out all statistics except those belonging to the user.

If you don't care if users can access each others' statistics, you can use a single profile with a single database, and give each user a bookmark URL pointing to their statistics in the database; this is the simplest approach, but it makes it possible for one user to see another's statistics, which is usually undesirable.

Advantages of using a single database:

Advantages of using multiple databases:

In summary, you'll usually want to use multiple databases for multiple servers or sites. The main situation you'd want to use a single database for is if you're using FTP over a metered line to fetch the data; a single database will fetch it just once. Even then, though, you could set up an external script to fetch the log data to the local disk once, and then process it locally with $(PRODUCT_NAME).

" } filteringdomain = { label = "Filtering All but One Domain" question = "Can $PRODUCT_NAME generate statistics on just one domain, from a log file containing log data from many domains?" short_answer = "Yes. Add a log filter that rejects hits from all other domains." long_answer = "

Yes. This can be done easily using a log filter. To do this, click Show Config in the profiles list, click Log Filters, and create a new log filter with this value:

Replace mydomain.com with the actual domain, and replace server_domain with the name of the log field which reports the server domain in your log data. Sometimes, this field is called cs_host. If there is no such field in your log data, then you'll need to use a different log format in order to filter by domain.

The next time you rebuild the database, all log entries from domains other than the one you entered will be rejected, leaving only statistics from the one domain.

" } filteringdirectory = { label = "Excluding a File or {=capitalize(lang_stats.directory)=}" question = "How can I remove a particular file or directory from the statistics?" short_answer = "Use a Log Filter to reject all hits on that file or directory." long_answer = "

Create a new Log Filter to reject all hits on that file or directory. To do this, click Show Config in the profiles list, click Log Filters, and create a new log filter with this value:

The filter above rejects hits on the /robots.txt file. Or use this:

The filter above rejects all hits on the /somedir/ directory.

The next time you rebuild the database, all hits on that page or directory will be rejected, so they will not appear in the statistics.

By the way, the same technique can be use to filter hits based on any field, for instance all hits from a particular host or domain, or all hits from a particular referrer, or all hits from a particular authenticated user.

" } rejectspiders = { label = "Discarding hits from spiders" question = "How can I throw away all the spider hits, so I only see statistics on non-spider hits?" short_answer = "Use a Log Filter to reject all hits from spiders (and worms)." long_answer = `

Create a new Filter to reject all hits from spiders. The easiest way to create log filters is in the Log Filter Editor, in the Log Filters section of the Config. To get to the Log Filters editor, click Show Config in the Profiles list (or click Config in the reports), then click Log Data down the left, then click Log Filters. To create the filter:

You can also use the Advanced Expression Syntax option from the Filter Type drop down list (on the Filter tab), and type in this filter expression into the value field:

if (spider ne \"(not a spider)\") then \"reject\";

For log format which use the User Agent Analysis snapon to compute spider values, the spider field is called uaa_spider, and the expression above must be changed to refer to that field name instead.

Then rebuild your database, and all hits from spiders will be discarded.

For more details on Filters see {=docs_chapter_link('filters')=}.

` } gzippeddata = { label = "Processing zipped, gzipped, or bzipped Log Data" question = "Can $PRODUCT_NAME process ZIPped, gzipped, or bzipped log data?" short_answer = "Yes, all three." long_answer = "

Yes. Any files that end with a .gz, .zip, .bz, or .bz2 will be treated as compressed files by $(PRODUCT_NAME). It will uncompress them \"on the fly\" (not modifying the original file and not creating any new files), and process their uncompressed data the same way it reads normal log files.

" } upgradingpreserving = { label = "Upgrading Without Losing Data" question = "How can I upgrade to a new version of $PRODUCT_NAME without losing my profiles, databases, and other data?" short_answer = "When upgrading 8.6.x to a newer 8.6.x on Windows, just install the new version on top of the old. When upgrading from an older 8.1.x version to a newer 8.6.x version on Windows, just install the new version on top of the old; when you next view the profiles, it will ask you to convert the older-format profiles and databases to the new. When upgrading 8.x to a newer 8.x on non-Windows, install the new and copy profiles, databases, etc. from the old LogAnalysisInfo to the new; if it's from 8.1.x to 8.6.x, it will prompt for conversion. When upgrading 7 to 8.6.x, use the Import link in the Admin menu." long_answer = "

Upgrading and older $PRODUCT_NAME 8.x to $PRODUCT_NAME 8.6.x (Windows)

$PRODUCT_NAME 8.6.x can be installed directly on top of 8.1.x, or on top of an older 8.6.x. On Windows, just run the installer; it will simply install what's necessary and will not overwrite or remove your existing profiles, databases, or any user configuration data. Installation will not result in data loss. Once the install is complete, you are now ready to continue using $PRODUCT_NAME. If you're upgrading from 8.1.x to 8.6.x, $PRODUCT_NAME will detect your 8.1.x profiles, and will prompt you to convert them when you next view the Profiles list. Databases are converted at the same time as profiles, so you should copy the databases before you run the conversion.

Upgrading an older $PRODUCT_NAME 8.x to $PRODUCT_NAME 8.6.x (non-Windows)

If you're upgrading from an older 8.x to a newer 8.x on a non-Windows installation, start by installing/unpacking the new installation. Don't run it yet, though. In order to preserve profiles, settings, databases, and more, you need to copy them from the old LogAnalysisInfo $lang_stats.directory. Here are the parts you may want to copy:

  1. Profiles. Copy the entire profiles folder in the LogAnalysisInfo $lang_stats.directory, to the new one.

  2. Preferences. Copy preferences.cfg from your LogAnalysisInfo directory, to the new one.

  3. Databases. Copy $lang_stats.directories from your existing LogAnalysisInfo $lang_stats.directory to the new one.

  4. Schedules. Copy the file schedules.cfg from your existing LogAnalysisInfo $lang_stats.directory to the new one.

  5. Users. Copy the file users.cfg from your existing LogAnalysisInfo $lang_stats.directory to the new one.

  6. Licenses. Copy the file licenses.cfg from your existing LogAnalysisInfo $lang_stats.directory to the new one.

  7. Startup Data. Copy the file system.cfg from your existing LogAnalysisInfo $lang_stats.directory to the new one.

  8. Roles. Copy the files roles_enterprise.cfg and roles_standard.cfg from your existing LogAnalysisInfo $lang_stats.directory to the new one.

  9. User Settings. Some per-user settings, including temporary modifications to report (e.g., showing more rows), and report filters, are in the users_cache $lang_stats.directory in LogAnalysisInfo. Copy these from the old installation to the new, to preserve these settings.

If you edited the graph colors file (LogAnalysisInfo/graph_colors.cfg) or the field categories file (LogAnalysisInfo/field_categories.cfg), copy those from the olg LogAnalysisInfo to the new also.

There is a perl script in Extras, update.pl, which does all these copies in a single step.

After these files are copied, you can start the server in the new installation, and all your data should be preserved. If you're upgrading from 8.1.x to 8.6.x, $PRODUCT_NAME will detect your 8.1.x profiles, and will prompt you to convert them when you next view the Profiles list. Databases are converted at the same time as profiles, so you should copy the databases before you run the conversion.

Upgrading from $PRODUCT_NAME 7

To upgrade a $PRODUCT_NAME 7 installation to $PRODUCT_NAME 8, install $PRODUCT_NAME 8 in a different location (don't install it over $PRODUCT_NAME 7!), and then in the Admin menu of $PRODUCT_NAME 8, choose Import. Choose the location of the $PRODUCT_NAME 7 LogAnalysisInfo folder (in the installation directory), and you will be prompted to import profiles, databases, and users from the $PRODUCT_NAME 7 installation. After the upgrade is complete and you have verified that all components were upgraded successfully, you can delete the old installation.

WARNING: Regardless of the upgrade version path, back up your existing installation before upgrading. The upgrade process is complex, and if it fails for any reason, it can result in the corruption of the profiles, databases, etc. Be sure you have a backup before upgrading.

" } exiturls = { label = "Tracking Exit URLs" question = "How can I tell where visitors went when they left the site?" short_answer = "Normally, you can't. However, you can set up \"reflector\" pages if you need this information." long_answer = "

$PRODUCT_NAME can show you the last page visitors hit before they exited the site, but it cannot usually show you where they went. The reason is that when they click a link on your site leading to another site, their web browser contacts the other site (not your site) for the new page--your web server is not contacted at all when someone clicks a link to leave your site. So the hit appears in the remote site's log files, not yours, and $PRODUCT_NAME cannot report on it because it's not in your log files.

Nevertheless, you can track exits from your site if you're willing to set up \"reflector\" pages. A reflector page is a page whose sole purpose is to reflect a visitor to another page. This can be done with a trivial HTML page containing only a META RELOAD tag in the HEAD section. For instance, the following simple HTML page will cause a visitor to be immediately redirected to http://www.flowerfire.com:

  <html>
  <head>
  <meta http-equiv=\"Refresh\" content=\"0; URL=http://www.flowerfire.com/\">
  </head>
  </html>

By creating a page like this for every exit link on your site, and changing your links to point to the reflector page rather than the actual destination page, you can track exit link usage. When a visitor clicks the exit link, they will be taken to the reflector page, and then immediately reflected to the actual destination. This will happen quickly enough that they will not notice the reflection happening--it will seem to them that they went straight to the destination page. But your log data will include a hit on the reflector page, so you will be able to see which exit links are being taken. In the \"exit pages\" view, the reflector links will show which links were taken when leaving the site.

A more sophisticated way of doing this is to create a CGI script (or other type of script) which generates the reflector page on the fly, given a URL parameter. If you do it that way, you won't need to create a separate reflector page for each link; you can just use the same script for all your external links.

" } visitorsums = { label = "Visitor Totals Don't Add Up" question = "When I add up the number of visitors on each day of the month, and I compare it to the total visitors for the month, they're not equal. Why not? Also, why doesn't the sum of visitors on subpages/subdirectories add up to the total for the directory, and why doesn't the sum of visitors on subdomains add up to the total for the domain, etc.? Why are there dashes (-) for the visitor totals?" short_answer = "Because \"visitors\" is the number of unique visitors, a visitor who visits every day will show up as a single visitor in each day's visitors count, but also as a single visitor for the whole month -- not 30 visitors! Therefore, simple summation of visitor numbers gives meaningless results." long_answer = "

We get this a lot as a bug report, but $PRODUCT_NAME is not counting visitors wrong. \"Visitors\" in $PRODUCT_NAME's terminology refers to unique visitors (see {=docs_faq_link('datatypes') =}). So:

Here's why. Suppose you have a web site where only one person ever visits it, but that person visits it every day. For every day of the month, you will have a single visitor. For the entire month, too, you will have a single visitor, because visitors are unique visitors, and there was only one visitor in the entire month, even though that visitor came back again and again. But in a 30-day month, the sum of the visitors per day will be 30, or one visitor per day. So though $PRODUCT_NAME will correctly report one visitor that month, it will also correctly report one visitor per day.

If what you're really looking for is \"visits\" rather than \"visitors\" (so each visit will count once, even if it's the same visitor coming back over and over), then that's what $PRODUCT_NAME calls \"sessions,\" and you can get information about them in the Sessions Summary and other session-related views (paths through the site, entry pages, exit pages, time spent per page).

In table reports, the total row is calculated by summing all other rows. Because visitors cannot be summed in this way, the visitors column in the total row will always be a dash (-).

" } dnslookup = { label = "Resolving IP Numbers" question = "When I look at the top hosts and top domains, all I see are numbers (IP addresses). How do I get the domain information?" short_answer = "Turn on reverse DNS lookup in the Network options (or in your web server), or use $PRODUCT_NAME's \"look up IP numbers using DNS\" feature." long_answer = "

Your web server is tracking the IP numbers of visitors, but not their hostnames or domains. If you need hostname or domain information, you need to tell $PRODUCT_NAME (or your web server) to look up the IP addresses using DNS (domain name service). One way to do this is to turn on DNS lookup in your web server; that will slow down your server, but then $PRODUCT_NAME will report hostnames and domains without any performance penalty during log data processing.

If you're not willing to take the performance hit on your server, or if you want to analyze log data that has already been generated with IP addresses, you can turn on $PRODUCT_NAME's reverse DNS feature like this:

  1. Log in to $PRODUCT_NAME.

  2. Click \"Config Options\" for the profile you want to modify.

  3. Click \"DNS Lookup, Support & Action Email\" in the menu.

  4. Check the box labeled \"Look up IP numbers using domain nameserver (DNS)\".

  5. Enter the hostnames or IP addresses of one or two DNS servers in the DNS server fields. You can get this information from your network administrator, or your ISP.

  6. Click \"Save Changes\".

  7. Rebuild the database (e.g. choose \"Build Database\" from the menu at the top).

Processing log data will be slower with reverse DNS turned on, but you will get full hostname and domain information.

If you have problems getting the DNS feature to resolve IP addresses, see {=docs_faq_link('dnsproblems') =}.

A third option is to use a separate DNS resolving program to compute your log files after the server is done writing them, and before $PRODUCT_NAME analyzes them. Examples include logresolve, which is included with the popular Apache web server, DNSTran, which runs on several platforms including Macintosh, Linux, Solaris, and IRIX.

If you're using UNIX or MacOS X, another good option is adns, an asynchronous DNS lookup library that includes some command-line tools for looking up IP addresses, including adnslogres (for Common Access format and Apache Combined format files) and adnsresfilter (for other types of log files). For instance, you can use the command \"adnsresfilter < /path/to/my/log.file\" as your log source command to use adns. adns is faster than logresolve, but more difficult to configure initially.

You can plug any command-line DNS resolver directly into $PRODUCT_NAME by using a command log source, and entering a UNIX command that resolves the IPs in the log file and dumps the resolved log data to the standard output stream, in this case

  logresolve < /path/to/my/log.file
Once you've done that, $PRODUCT_NAME will automatically run logresolve when you process your log data, and it will resolve the data before feeding it to $(PRODUCT_NAME).

" } mappeddrivewithservice = { label = "Can't See Network Drives with $PRODUCT_NAME as Service" question = "Why can't $PRODUCT_NAME see my mapped drive, share, directory, or mount points when I run it as a Windows Service?" short_answer = "The Service must run with the same privileged user account that has the mapped drive, share, directory, or mount point privilege." long_answer = "

The mapped drive, share, directory, or mount point is a permission issue that involves security. It is therefore necessary to have the service run using that same privileged account that the drive was originally mapped from, or an account which has permissions to access the share, etc. If the service cannot connect as the same user that has the privilege, the network resource will not be available.

Here is a step-by-step walkthrough on how to change the service logon permission:

  1. Go to Control Panel

  2. Open up Services (location varies slightly with particular OS version)

  3. Find the $PRODUCT_NAME entry (or the entry for the service running which is being used to run $PRODUCT_NAME and right mouse click it.

  4. Select Properties

  5. Under the 'Log On' tab deselect the 'Local System Account' radio button by selecting 'This account' and hit the browse button

  6. In the 'Select User' dialog box, you may type in the privileged user's UserID or you may also browse for it. Once you have selected the correct user, click the OK button and the 'This account' field will be populated by a period, then a back slash (\) then the users' ID

  7. Enter the privileged user's password twice. This will show up as asterisks. This is for security reasons and by design

  8. Back at the Control Panel properties for the $PRODUCT_NAME entry, right mouse click and select the 'restart' option.

  9. When you next run $PRODUCT_NAME, access to the mapped drive, share, directory, or mount point will be available

  10. .
" } # mappeddrivewithservice mappeddrive2003 = { label = "Can't See Network Drives in Windows 2003" question = "Why can't $PRODUCT_NAME see my mapped drive, share, directory, or mount points when I run it under Windows 2003?" short_answer = `Windows 2003 has a strict security policy which prevents access to network drives from $PRODUCT_NAME. To make it work, you need to let "everyone" permissions apply to anonymous, and remove the restriction on anonymous access to named pipes and shares (in Administrative Tools).` long_answer = `

The Windows 2003 security policies prevent programs like $PRODUCT_NAME from accessing network drives (mapped or UNC). In order to enable access to these drives, you need to do this:

  1. Go to Control Panel

  2. Open Administrative Tools

  3. Click Local Security Policy

  4. Click the Local Policies folder

  5. Click the Security Options folder

  6. Under Network Access, turn on "Let Everyone permissions apply to anonymous users."

  7. Under Network Access, turn off "Restrict anonymous access to named pipes and shares."

Now Windows 2003 will let $PRODUCT_NAME see and access network drives.

` } # mappeddrive2003 w2003_ie_lockdown = { label = "Can't access server with Windows 2003 and IE" question = "On Windows 2003, I can't access the $PRODUCT_NAME server using Internet Explorer. Why not?" short_answer = "The \"Internet Explorer Enhanced Security Configuration\" may be enabled, blocking access; uninstall it or add 127.0.0.1:8988 to the trusted sites." long_answer = "

Windows 2003 starts up with Internet Explorer \"locked down\" in a highly secure mode where only certain sites are accessible. In particular, $PRODUCT_NAME's default URL cannot be accessed by Internet Explorer.

To enable access to $PRODUCT_NAME from Internet Explorer, do this:

  1. Go to Internet Explorer.
  2. Go to the Tools menu.
  3. Choose Internet Options.
  4. Click the Security tab.
  5. Click the Trusted Sites icon.
  6. Click the Sites button.
  7. Add 127.0.0.1:8988 to the list.

Now you should be able to access $PRODUCT_NAME with Internet Explorer.

Alternatively, use a different browser which does not restrict access.

Alternatively, go to the Add/Remove Programs control panel and uninstall \"Internet Explorer Enhanced Security Configuration\".

" } # w2003_ie_lockdown pageparameters = { label = "Page Parameters" question = "I use parameters on my pages (e.g. index.html?param1+param2), but $PRODUCT_NAME just shows \"index.html?(parameters).\" How can I see my page parameters?" short_answer = "Delete the Log Filter that converts the parameters to \"(parameters).\"" long_answer = "

By default, $PRODUCT_NAME creates a log filter to convert everything after the ? in the page field to \"(parameters)\". In most cases that's best, because it reduces the size of the database significantly. But if you need the parameter information, it's easy to get it back--just delete that filter. You can do that like this:

  1. Go to the Config section of your profile.

  2. Click Log Filters.

  3. If your log format is Apache or similar, find the log filter which replaces everything after \"?\" with \"(parameters)\", and delete or disable that log filter.

  4. If your log format is IIS or similar, find the log filter which appends the cs_uri_query field to the cs_uri_stem field, and enable that log filter.

  5. Rebuild the database.

When you view the reports, you'll see that \"(parameters)\" has now been replaced by actual parameters.

" } combinereferrers = { label = "Combining Referring Domains" question = "How can I combine referrers, so hits from http://search.yahoo.com, http://dir.yahoo.com, and http://google.yahoo.com are combined into a single entry?" short_answer = "Create a log filter converting all the hostnames to the same hostname." long_answer = "

You can do this by converting all of the hostnames to a single hostname, so for instance they all appear as http://yahoo.com referrers. To do this, you need to convert all occurrences of /search.yahoo.com/, /dir.yahoo.com/, or /www.yahoo.com/ into /yahoo.com/, in the referrer field. The easiest way is to make three log filters, in the Log Filters section of the Config part of your profile:

Then rebuild the database; the resulting statistics will combine all three referrers in a single /yahoo.com/ referrer.

A more sophisticated filter is necessary if you need to preserve some parts of the URL while converting others. In that case, you can use a regular expression filter:

The way this works is it matches any referrer starting with http://us.fN.mail.yahoo.com/ym/ (where N is any integer), and while it's matching, it extracts everything after the /ym/ into the variable \$(1). The leading ^ ensures that the referrer starts with http://, the trailing \$ ensures that the parenthesized .* section contains all of the remainder after /ym/, [0-9]* matches any integer, and \\. matches a single period (see {=docs_chapter_link('regexp')=} for more information about regular expressions). If it matches, it sets the referrer field to http://us.f*.mail.yahoo.com/\$1, where \$1 is the value extracted from the original URL. This allows you to collapse all http://us.fN.mail.yahoo.com/ URLs into a single one without losing the extra data beyond /ym/. If you don't care about the data beyond /ym/, you can use somewhat simpler (or at least easier-to-understand) filter:

This one uses a wildcard comparison (if matches wildcard expression) rather than a regular expression, which allows the use of * in the expression in its more generally used meaning of \"match anything\". Note also that in the first line, * appears twice and each time matches anything, but in the second line it appears only once, and is a literal *, not a \"match-anything\" character.

" } clusteredservers = { label = "Clustered Servers" question = "Can $PRODUCT_NAME combine the logs from multiple clustered or load balanced web servers, so that the user has one view of the data? Can it report separately on the different servers?" short_answer = "Yes." long_answer = "

$PRODUCT_NAME can read any number of log files, from any number of servers, into a single database to show a single aggregate set of reports of all the data. If the logs also contain information about which server handled each request (or if each server has a separate log file, or a set of separate log files), then $PRODUCT_NAME can also show per-server statistics, if desired. Unlike many log analysis tools, $PRODUCT_NAME does not care if the files are in order, or if their date ranges overlap -- any combinations of any number of files with data in any order are possible.

To see per-server statistics, look in the reports for a report which breaks down the overall events by server. This might be called \"Server domains\" or \"Server hosts\" or \"Server IPs\" or something else, depending on the log data. Click on a particular server in that report; that zooms you in on that server. Now choose any other report from the \"Default report on zoom\" dropdown menu, to see a breakdown of the statistics for that server only. Alternatively, you can use the global filters to zoom \"permanently\" on a particular server, and then all reports will automatically show numbers for that server only.

If you don't have a field that tracks the server, you may still be able to get per-server statistics, by using the current_log_pathname() function detect which server each hit came from. You'll need to create a custom field in that case, with a log field to track the server, a filter to compute the field from the log pathname, and a database field and report for the field. For information on creating custom fields, see {=docs_faq_link('custom_fields') =}.

" } logentryorder = { label = "Log Entry Ordering" question = "Does the log data I feed to $PRODUCT_NAME need to be in chronological order?" short_answer = "It depends on the format, but in most cases, the log data can be in any order." long_answer = `

$PRODUCT_NAME usually doesn't care what order the log data is in. For most common formats, which have one event per line of log data, $PRODUCT_NAME will just read the log data in any order, and if a reordering is required for some analysis (like web server sessions), it will automatically sort it before doing the analysis. Similarly, when using multiprocessor parsing, $PRODUCT_NAME will split the log data into chunks to distribute to each parsing server, and may parse or import them out of chronological order, without this causing any problems for the reports.

However, there are exceptions. If a log format has dependencies between lines of log data, e.g., if a line of data refers to previous lines of data in any way, then it may be necessary to process the logs in order, to get consistent results. Otherwise, at the boundaries between blocks of log data, it may not be possible to interpret the meaning of the first few lines, which depend on lines from other blocks, which might not have been processed yet, or might be analyzed simultaneously in other threads.

Examples of this type of dependency are Postfix logs, and many other mail server logs, which log "recipient" events on separate lines from "sender" events; Wowza and Flash media server logs, which report incremental bandwidth on each line which must be compared to previous lines to determine actual bandwidth usage by that event; and any log format plug-in which logs events across multiple lines (there are many, but they tend to be less frequently analyzed formats). Examples of log formats not affected are most common formats, including all web servers, all firewall or proxy or gateway servers, and all media servers except Flash and Wowza.

This "boundary problem" is unavoidable to some degree, since every log dataset has at least two boundaries, at the first line of log data and the last one. But it is exacerbated by out-of-order log file processing, and multiprocessor parsing, both of which introduce additional boundaries into the analysis.

A typical analysis will have a small number of boundaries, relative to the number of "good" lines of log data, so this issue can usually be ignored. However, it may result in slight differences in reported numbers from one build to the next, of the same dataset, when using multiprocessor parsing. In rare cases, the differences can be large.

If the boundary problem needs to be eliminated in a profile, it can be mostly resolved by turning off multiprocessor parsing (with {=docs_option_link('log.processing.distributed.method')=}); this will eliminate all boundaries except those between files. If the intra-file boundaries are an issue (which can happen if the profile uses log filters to keep information from previous lines, and apply it to current lines), logs can be manually imported in chronological order, for instance by concatenating them to a single file and importing that file.

Database filters also provide a way of solving this problem in some cases. Since database filters, unlike log filters, operate on the database after it has been imported, and since they can sort the data before they operate, it is usually possible to process data in the required order, regardless of the log data order. The Sessions snap-on uses this technique to analyze the data chronologically, and in order of IP, without requiring the imported log data to be in any special order.

` } create_many_profiles = { label = "Creating many profiles in a batch" question = "How can I create many profiles in a batch, from a template?" short_answer = "Use the create_many_profiles command-line option." long_answer = `

To create many profiles in a batch, all based on a particular "template" profile, you can use the create_many_profiles command-line feature. To do that, start by editing the file LogAnalysisInfo/miscellaneous/create_many_profiles.cfg file, using a text editor. Do the following:

` } # create_many_profiles selinux = { label = "Configuring $PRODUCT_NAME to work with Security Enhanced Linux, in CGI mode" question = "$PRODUCT_NAME doesn't work in CGI mode with SELinux enabled; how do I get it to work?" short_answer = "Use semodule to allow the operations that $PRODUCT_NAME uses; see the long answer." long_answer = `

Security Enhanced Linux (SELinux) restricts what programs can do, to prevent them from misbehaving. The default behavior for an unrecognized program blocks certain operations that $PRODUCT_NAME needs to function, resulting in a blank screen when running $PRODUCT_NAME in CGI mode. This article describes how to lower the restrictions to allow $PRODUCT_NAME to work.

Start by creating a file called sawmill.te, with the following contents:

module sawmill 1.0;

require {
       class appletalk_socket create;
       class dir getattr;
       class dir read;
       class dir search;
       class dir { getattr read };
       class dir { read search };
       class file getattr;
       class file read;
       class netlink_route_socket bind;
       class netlink_route_socket create;
       class netlink_route_socket getattr;
       class netlink_route_socket nlmsg_read;
       class netlink_route_socket read;
       class netlink_route_socket write;
       class socket create;
       class socket ioctl;
       class udp_socket create;
       class udp_socket ioctl;
       class unix_dgram_socket create;
       role system_r;
       type apmd_log_t;
       type autofs_t;
       type boot_t;
       type faillog_t;
       type file_t;
       type httpd_log_t;
       type httpd_sys_script_t;
       type lastlog_t;
       type mnt_t;
       type net_conf_t;
       type proc_net_t;
       type rpm_log_t;
       type samba_log_t;
       type sendmail_log_t;
       type squid_log_t;
       type sysctl_net_t;
       type sysfs_t;
       type var_log_t;
       type var_t;
       type wtmp_t;
};

allow httpd_sys_script_t apmd_log_t:file getattr;
allow httpd_sys_script_t autofs_t:dir getattr;
allow httpd_sys_script_t boot_t:dir getattr;
allow httpd_sys_script_t faillog_t:file getattr;
allow httpd_sys_script_t file_t:dir getattr;
allow httpd_sys_script_t httpd_log_t:dir getattr;
allow httpd_sys_script_t httpd_log_t:dir read;
allow httpd_sys_script_t httpd_log_t:file read;
allow httpd_sys_script_t lastlog_t:file getattr;
allow httpd_sys_script_t mnt_t:dir getattr;
allow httpd_sys_script_t net_conf_t:file getattr;
allow httpd_sys_script_t net_conf_t:file read;
allow httpd_sys_script_t proc_net_t:dir { read search };
allow httpd_sys_script_t proc_net_t:file getattr;
allow httpd_sys_script_t proc_net_t:file read;
allow httpd_sys_script_t rpm_log_t:file getattr;
allow httpd_sys_script_t samba_log_t:dir getattr;
allow httpd_sys_script_t self:appletalk_socket create;
allow httpd_sys_script_t self:netlink_route_socket bind;
allow httpd_sys_script_t self:netlink_route_socket create;
allow httpd_sys_script_t self:netlink_route_socket getattr;
allow httpd_sys_script_t self:netlink_route_socket nlmsg_read;
allow httpd_sys_script_t self:netlink_route_socket read;
allow httpd_sys_script_t self:netlink_route_socket write;
allow httpd_sys_script_t self:socket create;
allow httpd_sys_script_t self:socket ioctl;
allow httpd_sys_script_t self:udp_socket create;
allow httpd_sys_script_t self:udp_socket ioctl;
allow httpd_sys_script_t self:unix_dgram_socket create;
allow httpd_sys_script_t sendmail_log_t:dir getattr;
allow httpd_sys_script_t squid_log_t:dir getattr;
allow httpd_sys_script_t sysctl_net_t:dir search;
allow httpd_sys_script_t sysfs_t:dir getattr;
allow httpd_sys_script_t var_log_t:dir read;
allow httpd_sys_script_t var_log_t:file getattr;
allow httpd_sys_script_t var_t:dir read;
allow httpd_sys_script_t wtmp_t:file getattr;

Then run the following commands, as root:

 checkmodule -M -m -o sawmill.mod sawmill.te
 semodule_package -o sawmill.pp -m sawmill.mod
 semodule -i sawmill.pp

These commands package up and install a SE module which allows $PRODUCT_NAME to perform all of its operations. Once you have run these commands, $PRODUCT_NAME should function as a CGI program.

` } # selinux removing_database_fields = { label = "Removing Database Fields" question = "How do I remove fields from the database to save space?" short_answer = "Delete the database.fields entry from the profile .cfg file, and delete any xref groups and reports that use it." long_answer = `

Deleting database fields reduces the size of the database, and reduces the time required to build the database. Here's how you can delete a database field:

  1. Using a text editor, edit the .cfg file for your profile, in LogAnalysisInfo/profiles.

  2. Search for "database = {" and then search forward from there for "fields = {" to find the database fields section. Comment out the field you don't want (or delete it). For instance, to remove the screen_dimensions field, change this section:

          screen_dimensions = {
            label = "DOLLARlang_stats.field_labels.screen_dimensions"
            type = "string"
            log_field = "screen_dimensions"
            suppress_top = "0"
            suppress_bottom = "2"
            always_include_leaves = "false"
          } # screen_dimensions
    

    to this:

    #      screen_dimensions = {
    #        label = "DOLLARlang_stats.field_labels.screen_dimensions"
    #        type = "string"
    #        log_field = "screen_dimensions"
    #        suppress_top = "0"
    #        suppress_bottom = "2"
    #        always_include_leaves = "false"
    #      } # screen_dimensions
    
  3. Now that the database field is gone, you will still need to remove any references to the field from other places in the profile. Typically, there is an xref group for this field, so this needs to be removed as well. Search from the top for cross_reference_groups, and comment out the group associated with the field or delete it. For instance, for screen_dimensions field, change this section:

          screen_dimensions = {
            date_time = ""
            screen_dimensions = ""
            hits = ""
            page_views = ""
          } # screen_dimensions
    

    to this:

    #      screen_dimensions = {
    #        date_time = ""
    #        screen_dimensions = ""
    #        hits = ""
    #        page_views = ""
    #      } # screen_dimensions
    
  4. By default, there will also be a report for the field, which has to be removed. Search for "reports = {", then search forward for the appropriate report name, which is the same as the database field name. Comment it out or delete it. For instance, search for "screen_dimensions = {", and then comment it out, replacing this:

          screen_dimensions = {
            report_elements = {
              screen_dimensions = {
                label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}"
                type = "table"
                database_field_name = "screen_dimensions"
                sort_by = "hits"
                sort_direction = "descending"
                show_omitted_items_row = "true"
                omit_parenthesized_items = "true"
                show_totals_row = "true"
                starting_row = "1"
                ending_row = "10"
                only_bottom_level_items = "false"
                columns = {
                  0 = {
                    type = "string"
                    visible = "true"
                    field_name = "screen_dimensions"
                    data_type = "string"
                    header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}"
                    display_format_type = "string"
                    main_column = "true"
                  } # 0
                  1 = {
                    header_label = "%7B=capitalize(database.fields.hits.label)=}"
                    type = "number"
                    show_number_column = "true"
                    show_percent_column = "true"
                    show_bar_column = "true"
                    visible = "true"
                    field_name = "hits"
                    data_type = "int"
                    display_format_type = "integer"
                  } # 1
                  2 = {
                    header_label = "%7B=capitalize(database.fields.page_views.label)=}"
                    type = "number"
                    show_number_column = "true"
                    show_percent_column = "false"
                    show_bar_column = "false"
                    visible = "true"
                    field_name = "page_views"
                    data_type = "int"
                    display_format_type = "integer"
                  } # 2
                } # columns
              } # screen_dimensions
            } # report_elements
            label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}"
          } # screen_dimensions
    

    to this:

    #      screen_dimensions = {
    #        report_elements = {
    #          screen_dimensions = {
    #            label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}"
    #            type = "table"
    #            database_field_name = "screen_dimensions"
    #            sort_by = "hits"
    #            sort_direction = "descending"
    #            show_omitted_items_row = "true"
    #            omit_parenthesized_items = "true"
    #            show_totals_row = "true"
    #            starting_row = "1"
    #            ending_row = "10"
    #            only_bottom_level_items = "false"
    #            columns = {
    #              0 = {
    #                type = "string"
    #                visible = "true"
    #                field_name = "screen_dimensions"
    #                data_type = "string"
    #                header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}"
    #                display_format_type = "string"
    #                main_column = "true"
    #              } # 0
    #              1 = {
    #                header_label = "%7B=capitalize(database.fields.hits.label)=}"
    #                type = "number"
    #                show_number_column = "true"
    #                show_percent_column = "true"
    #                show_bar_column = "true"
    #                visible = "true"
    #                field_name = "hits"
    #                data_type = "int"
    #                display_format_type = "integer"
    #              } # 1
    #              2 = {
    #                header_label = "%7B=capitalize(database.fields.page_views.label)=}"
    #                type = "number"
    #                show_number_column = "true"
    #                show_percent_column = "false"
    #                show_bar_column = "false"
    #                visible = "true"
    #                field_name = "page_views"
    #                data_type = "int"
    #                display_format_type = "integer"
    #              } # 2
    #            } # columns
    #          } # screen_dimensions
    #        } # report_elements
    #        label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}"
    #      } # screen_dimensions
    
  5. Now you need to remove the report element from the single_page_summary report. Search for single_page_summary, then search forward for the field name (e.g., search for "screen_dimensions = {"). Again, comment out the whole report element or delete it, replacing this:

              screen_dimensions = {
                label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}"
                type = "table"
                database_field_name = "screen_dimensions"
                sort_by = "hits"
                sort_direction = "descending"
                show_omitted_items_row = "true"
                omit_parenthesized_items = "true"
                show_totals_row = "true"
                starting_row = "1"
                ending_row = "10"
                only_bottom_level_items = "false"
                columns = {
                  0 = {
                    type = "string"
                    visible = "true"
                    field_name = "screen_dimensions"
                    data_type = "string"
                    header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}"
                    display_format_type = "string"
                    main_column = "true"
                  } # 0
                  1 = {
                    header_label = "%7B=capitalize(database.fields.hits.label)=}"
                    type = "number"
                    show_number_column = "true"
                    show_percent_column = "true"
                    show_bar_column = "true"
                    visible = "true"
                    field_name = "hits"
                    data_type = "int"
                    display_format_type = "integer"
                  } # 1
                  2 = {
                    header_label = "%7B=capitalize(database.fields.page_views.label)=}"
                    type = "number"
                    show_number_column = "true"
                    show_percent_column = "false"
                    show_bar_column = "false"
                    visible = "true"
                    field_name = "page_views"
                    data_type = "int"
                    display_format_type = "integer"
                  } # 2
                } # columns
              } # screen_dimensions
    

    with this:

    #          screen_dimensions = {
    #            label = "%7B=capitalize(pluralize(print(database.fields.screen_dimensions.label)))=}"
    #            type = "table"
    #            database_field_name = "screen_dimensions"
    #            sort_by = "hits"
    #            sort_direction = "descending"
    #            show_omitted_items_row = "true"
    #            omit_parenthesized_items = "true"
    #            show_totals_row = "true"
    #            starting_row = "1"
    #            ending_row = "10"
    #            only_bottom_level_items = "false"
    #            columns = {
    #              0 = {
    #                type = "string"
    #                visible = "true"
    #                field_name = "screen_dimensions"
    #                data_type = "string"
    #                header_label = "%7B=capitalize(database.fields.screen_dimensions.label)=}"
    #                display_format_type = "string"
    #                main_column = "true"
    #              } # 0
    #              1 = {
    #                header_label = "%7B=capitalize(database.fields.hits.label)=}"
    #                type = "number"
    #                show_number_column = "true"
    #                show_percent_column = "true"
    #                show_bar_column = "true"
    #                visible = "true"
    #                field_name = "hits"
    #                data_type = "int"
    #                display_format_type = "integer"
    #              } # 1
    #              2 = {
    #                header_label = "%7B=capitalize(database.fields.page_views.label)=}"
    #                type = "number"
    #                show_number_column = "true"
    #                show_percent_column = "false"
    #                show_bar_column = "false"
    #                visible = "true"
    #                field_name = "page_views"
    #                data_type = "int"
    #                display_format_type = "integer"
    #              } # 2
    #            } # columns
    #          } # screen_dimensions
    #        } # report_elements
    
  6. Finally rebuild the database.

` } # removing_database_fields default_page = { label = "Default Page Hits" question = "In my reports, I see entries for /somedir/, and /somedir, and /somedir/ (default page). What's the difference? I seem to have two hits for each hit because of this; one on /somedir and then one on /somedir/; what can I do to show that as one hit?" short_answer = "/somedir/ is the total hits on a directory and all its contents; /somedir is an attempt to hit that directory which was directed because it did not have the trailing slash; and the default page ones both indicate the number of hits on the directory itself (e.g., on the default page of the directory)." long_answer = `

To understand why there are hits shown on both /somedir/ and /somedir, where "somedir" is the name of a directory (folder) in the web site, it is necessary to understand what happens when there is a browser that tries to access http://hostname/somedir . That URL is incorrect (or at best, inefficient), because it lacks the trailing divider, which implies that somedir is a file. Here's what happens in this case:

  1. The web browser asks for a file named /somedir .

  2. The server checks, and finds that there is no file by that name (because it's a directory). It responds with a 302 redirect to /somedir/, which basically means, "no such file, but there is a directory; maybe that's what you meant?"

  3. The browser accepts the redirect, so now it requests a directory named /somedir/

  4. The server notes that there is a directory by that name, and that it contains an index or default file. It responds with a 200 event, and the contents of the index file.

This looks like this in the web logs:

$PRODUCT_NAME reports this as two hits, because it is two hits (two lines of log data). $PRODUCT_NAME differentiates the aggregate traffic within a directory from traffic which directly hits a directory, by using /somedir/ to represent aggregation of traffic in the directory, and using "/somedir/ (default page)" in graphical reports to represent hits on the directory itself (i.e., hits which resulted in the display of the default page, e.g., index.html or default.asp). So in graphical reports, the second hit above appears as a hit on "/somedir/ (default page)".

A good solution to this is to make sure that all links refer to directories with the trailing slash; otherwise the server and browser have to do the elaborate dance above, which slows everything down and doubles the stats.

Another option is to reject all hits where server response starts with 3, using a log filter like this one:

This discards the first hit of the two, leaving only the "real" (corrected) one.

In summary, hits on /somedir/ in reports represent the total number of hits on a directory, including hits on the index page of the directory, any other files in that directory, and any other files in any subdirectory of that directory, etc. Hits on /somedir in reports represent the 302 redirects caused by URLs which lack the final /. Hits on "/somedir/ (default page)" represent hits on the default page of the directory.

For information about selecting the default page using a report filter, see {=docs_chapter_link('report_filters')=}.

` } sessions_with_username = { label = "Tracking Sessions with Usernames instead of IPs" question = "$PRODUCT_NAME shows IP addresses, or hostnames, in the Sessions reports, but I want it to show usernames instead. How can I do that?" short_answer = "Edit the profile .cfg, and change sessions_visitor_id_field to the username field." long_answer = `

$PRODUCT_NAME calls this the "session user" field, or the "session visitor ID" field. This is the field which differentiates users; if the value in this field is different, for two events, $PRODUCT_NAME assumes that those events are from two different users, and therefore are not part of the same session.

By default, $PRODUCT_NAME uses the "client IP" field (or "hostname", or "source IP", or others, depending on the log format) to differentiate users. But if you have username information in your logs, it is sometimes better to use the username to differentiate sessions, because it better identifies an individual, especially in environments where individuals may use multiple IP addresses.

To do this, edit the profile .cfg file, which is in the LogAnalysisInfo/profiles $lang_stats.directory, using a text editor. Search for this line (its full location is log.field_options.sessions_visitor_id_field):

  sessions_visitor_id_field = "hostname"

and change "hostname" to "username" (or "cs_username", or "x_username", or "user", or whatever the field is called in your log data; you can see a list of field names by running $PRODUCT_NAME from the command line with "$PRODUCT_EXECUTABLE_DOCS -p {profilename} -a ldf"). For example change it to this, if your username field is called "username":

  sessions_visitor_id_field = "username"

Then, rebuild the database (or delete the LogAnalysisInfo/ReportCache $lang_stats.directory), and view a session report, and $PRODUCT_NAME will recompute your session reports using the user field.

` } hipaa = { label = "Support for HIPAA and Sarbanes-Oxley Compliance" question = "Does $PRODUCT_NAME produce reports for HIPAA and Sarbanes-Oxley (SOX) compliance?" short_answer = "Yes, run the Single-Page Summary report." long_answer = "

$PRODUCT_NAME produces reports that will track the network usage, network security and give a comprehensive view of who is accessing your website at any given date or time. The Single-Page Summary report will give the network detection and audit history reporting that is needed to be compliant with both HIPAA and SOX.

" } geolite = { label = "GeoIP database in $PRODUCT_NAME is not as accurate as the one on the Maxmind site" question = "Some of the IP addresses in my data are not resolved properly to country/region/city by $PRODUCT_NAME. I know that $PRODUCT_NAME uses the MaxMind GeoIP database, and when I go to the MaxMind site, their demo resolves these IPs properly. Why isn't $PRODUCT_NAME doing the same as the online GeoIP demo?" short_answer = "$PRODUCT_NAME uses the GeoLite City database, a less accurate (and less expensive) version of the GeoIP City database. To get full accuracy, buy GeoIP City from MaxMind." long_answer = `

MaxMind provides two tiers for their City database: GeoIP City and GeoLite City. They do not provide GeoIP City for bundling with products like $PRODUCT_NAME, so $PRODUCT_NAME includes the GeoLite City database. GeoLite City is less accurate than GeoIP City, so the results you get from $PRODUCT_NAME using its default GeoLite City database will be less accurate than using GeoIP City. Since the web demo of GeoIP on the MaxMind site uses GeoIP City, there will be some cases where $PRODUCT_NAME cannot place an IP, but the web demo can.

The solution is to upgrade to the full GeoIP City database, which you can do directly through MaxMind. That database is a drop-in replacement for GeoLite City, so once you have purchased it, you can drop it in on top of the GeoIP-532.dat file in the LogAnalysisInfo $lang_stats.directory in your $PRODUCT_NAME installation, and rebuild your databases, and you will get a more accurate geographical location.

` } format_durations_for_excel = { label = "Formatting Durations for Excel" question = "When I export CSV, durations appear as numbers, which Excel doesn't understand. How can I format durations to work with Excel?" short_answer = "Add an extra column to the spreadsheet to convert them to fractional days; or use a custom database field in the report element." long_answer = `

Excel represents durations in days, so "1" is one day, and "1/24" is one hour. But $PRODUCT_NAME represents them as seconds for some log formats, milliseconds for others, and microseconds for a few. To format them as durations in Excel, they must be converted. This can be done either after the export, in Excel, or before the export, in $PRODUCT_NAME.

Formatting After the Export

The easiest way, in most cases, is to add a new column in the exported spreadsheet, to convert between the units. For instance, if column E is the "time taken" field in milliseconds, create a new column with formula "=En/(1000*24*60*60)" where n is the row number, and fill down to populate the whole column. This will create a column whose values are "time taken" in days. Then format the cells of that column to use any "time" format, and it will be formatted as a time, in hour, minutes, seconds, etc.

Formatting as Part of the Export

If formatting after the export is not possible, or not efficient, you can do the conversion in $PRODUCT_NAME, but it's considerably more involved.

For this example, we'll assume we're dealing with the "time-taken" field in IIS web logs, called time_taken in $PRODUCT_NAME.

1. Create a database field with a custom expression.

This custom expression is to format the time-taken value in the standard duration_milliseconds format of $PRODUCT_NAME. Do this by editing the profile CFG file (in LogAnalysisInfo/profiles) with a text editor, finding the time_taken database field. Search for "database = {"; then search downward from there for "fields = {"; then search downward from there for "time_taken = {"), and duplicating it, adding a time_taken_excel_format database field underneath the time_taken database field:

      time_taken_excel_format = {
        label = "time taken (Excel format)"
        type = "string"
        log_field = "time_taken"
        display_format_type = "duration_milliseconds"
        expression = \`format(cell_by_name(row_number, 'time_taken'), 'duration_milliseconds')\`
      } # time_taken_excel_format

2. Add this as a column to the report you'll be exporting. For instance, if the report is the hour_of_day report, find its column in the CFG file by searching from the top for "statistics = {", then searching down from there for "reports = {", then searching down from there for "file_type = {"; then searching down from there for "columns = {". Copy the time_taken column, and edit the duplicate to look like this:

              time_taken_excel_format = {
                header_label = "time taken (Excel format)"
                type = "string"
                show_number_column = "true"
                show_percent_column = "false"
                show_bar_column = "false"
                visible = "true"
                field_name = "time_taken_excel_format"
                data_type = "string"
                display_format_type = "duration_milliseconds"
              } # time_taken_excel_format

3. Rebuild the database; then when you export this report, it will include a new "time taken (Excel format)" column, with standard $PRODUCT_NAME duration formatting ("Y years, D days, HH:MM:SS.MMM").

` } # format_durations_for_excel mysql_locks_exceeded = { label = "Error with MySQL: \"The total number of locks exceeds the lock table size\"" question = 'When I try to build a database, or view reports, I get an error, "The total number of locks exceeds the lock table size". How can I fix this?' short_answer = "Increase the innodb_buffer_pool_size in my.cnf (my.ini), to 256M." long_answer = `

This occurs when MySQL runs out of locks, which for an InnoDB database occurs when the buffer pool is full. You can fix this by increasing the size of the buffer pool, by editng the innodb_buffer_pool_size option in my.cnf (my.ini), to set innodb_buffer_pool_size to a number higher than the default (which is typically 8M); for instance:

  innodb_buffer_pool_size = 256M

Then, restart MySQL, and try the $PRODUCT_NAME operation again.

` } # mysql_locks_exceeded oraclerror = { label = "Error with Oracle: \"ORA-01000: maximum open cursors exceeded\"" question = 'When building a database with Oracle, I get an error, "ORA-01000: maximum open cursors exceeded." What can I do to fix this?' short_answer = "Increase open_cursors to 1000 in your Oracle server." long_answer = `

Though $PRODUCT_NAME does not directly use cursors, some ODBC drivers use several hundred cursors when $PRODUCT_NAME builds a database through them. This can cause an Oracle error if the maximum number of permitted cursors is insufficient.

You can monitor the number of open cursors by running this query against your Oracle database:

 SELECT v.value as numopencursors ,s.machine ,s.osuser,s.username FROM V\\$SESSTAT v, V\\$SESSION s WHERE v.statistic# = 3 and v.sid = s.sid;

To fix the problem increase the maximum number of cursors with this query:

  ALTER SYSTEM SET open_cursors = 1000 SCOPE=BOTH;

It is not necessary to restart the database server after running this command--it will affect the running instance immediately.

` } # oraclerror no_result = { label = "The background process terminated unexpectedly" question = "$PRODUCT_NAME displays the following error: \"The background process terminated unexpectedly, without returning a result.\" What does that mean, and how can I fix it?" short_answer = "$PRODUCT_NAME has probably crashed, so this could be a bug in $PRODUCT_NAME. See the long answer for suggestions." long_answer = `

This error message means that $PRODUCT_NAME tried to do a long task, like a report generation or a database build, and while it was trying to display progress for the task, it noticed that the task was no longer running, but had not properly computed and stored its result. A task always returns a result, so this means that something has gone wrong internally in $PRODUCT_NAME. The most likely cause is a crash: the background task crashed, so it will never able to complete and return the result.

A crash is often due to a bug in $PRODUCT_NAME, but it's also possible if $PRODUCT_NAME runs out of memory. Make sure there is enough memory available; if you watch the memory usage while you repeat the task, does it seem to reach a high level, near the maximum memory of the system, before failing? If so, you may need more memory in your system, in order to perform that task.

If it's not memory, try running the task from the command line. If it's a database build, you can run it from the command line using this: {=docs_faq_link('commandlinebuild')=}. If it's a crash during the report generation, you can run it from the command line similarly to a database build, but using "-a grf -rn reportname -ghtd report" instead of "-a bd", where reportname is the internal name of the report. Run $PRODUCT_NAME from the command line with "-p profilename -a lr" to get a list of reports. For instance,

  sawmill -p myprofile -a grf -rn single_page_summary -ghtd report

will generate the single-page summary to a $lang_stats.directory called "report". If this report fails, it may give a better error message about what happened to it.

Whether it fails or succeeds, email $SUPPORT_EMAIL with the outcome of your test. If possible, include the profile, and enough log data to reproduce the error (up to 10 MB, compressed). Report that you are seeing a crash on report generation (or database build, or whatever), and we will attempt to reproduce it on our own systems, determine the cause, and fix it, or help you resolve it, if it's not a bug.

` } urlsinpix = { label = "Tracking URLs in Cisco PIX log format" question = "How can I track full URLs, or HTTP domains, or resolved hostnames, when analyzing PIX log data?" short_answer = "You can't track full URLs or HTTP domains, because PIX doesn't log them; but you can turn on DNS lookup in the PIX or in $PRODUCT_NAME to report resolved hostnames." long_answer = "

The Cisco PIX log format can be configured to log hostnames as well as IPs; if it does, the PIX plug-in will report the hostnames. This is the preferred way to get hostname information from PIX. If that's not an option, $PRODUCT_NAME can be configured to look up IP addresses using the DNS Lookup section of the Config page. In this case, the IP address field value will be replaced by the resolved hostname, so this resolved hostname will appear in the IPs reports. PIX does not log URLs, however, so it is not possible for $PRODUCT_NAME to report domains accessed. PIX reports lines like this:

Accessed URL 12.34.56.78:/some/file/test.html

This shows the source IP, which we have from another line, and the URL stem, which is slightly useful, but it does not show the domain; and resolving the IP just gives the resolved hostname, not the domain from the URL. Still, it's better than nothing; resolving the hostname might give something like server156.microsoft.com, which at least tells you it's microsoft.com traffic, even if you can't tell whether it was mdsn.microsoft.com or www.microsoft.com.

PIX can also be configured to log hostnames in the Accessed URL lines, which looks something like this:

Accessed URL 12.34.56.78 (server156.microsoft.com):/some/file/test.html

But this has the same problem; it shows the hostname, not the HTTP domain. It seems that the HTTP domain is not available from PIX log data.

The reason we recommend doing DNS lookup in PIX rather than $PRODUCT_NAME are twofold:

1. DNS lookup after-the-fact may give a different hostname than it would have given at the time, and the one at the time is more accurate.

2. DNS lookup in $PRODUCT_NAME replaces the IP address with the hostname, so the IP is not available in the reports. DNS lookup in PIX *adds* the hostname as a separate field, so both are available in the reports.

" } mysqlmacx64 = { label = "MySQL and x64 MacOS" question = `I installed $PRODUCT_NAME on a 64-bit (x64) Mac, and now it says, "This profile uses a MySQL database, but MySQL is not enabled in this build." Why?` short_answer = "MySQL does not currently work on x64 MacOS." long_answer = "Because there is not a current version of MySQL available for x64 MacOS, it is not possible to build or use MySQL databases on x64 MacOS with $PRODUCT_NAME. When a x64 MacOS version of MySQL becomes available (from the makers of MySQL), we will add support in $PRODUCT_NAME. For now, use the x86 version of $PRODUCT_NAME, which will run on x64 MacOS, and can use MySQL." } # mysqlmacx64 restore = { label = "Backup and Restore" question = "How do I backup and restore my $PRODUCT_NAME installation, or a particular profile and its database?" short_answer = "Backup and restore the LogAnalysisInfo folder when no update or build is running, or for one profile. For MySQL also backup and restore the MySQL database." long_answer = "

If you're using the internal database, you can back up the LogAnalysisInfo $lang_stats.directory in your $PRODUCT_NAME installation $lang_stats.directory, to back up the entire installation; and you can restore it to restore the entire installation. This will back up profiles, databases, users, preferences, scheduled tasks, and more. The backup and restore must occur when there is no database update or rebuild in progress; it is fine if there is a report generation in progress.

^M^M

If you're using a MySQL database, you can do the backup/restore as described above, and you will also need to back up the MySQL database for each profile. By default, the MySQL database's name is the same as the internal name of the profile, but it can be overridden in Database Options, in the Config section of the profile. Consult the MySQL documentation for information on backing up and restoring a database.

^M^M

To backup or restore a particular profile, backup or restore the profile file from LogAnalysisInfo/profiles, and the database folder from LogAnalysisInfo/Databases, and if you're using MySQL, backup and restore the MySQL database for the profile.

" } # restore permission_denied = { label = "Permission Denied Errors" question = `On Windows, I sometimes get "permission denied" errors, or "volume externally altered" errors, or "file does not exist" error when building a database. But sometimes, it works. What can cause this sort of sporadic file error?` short_answer = "An anti-virus or anti-malware software, which is actively scanning your $PRODUCT_NAME installation folder, can cause this. Disable scanning of $PRODUCT_NAME's data folders, in the anti-virus product." long_answer = `

Some anti-virus software, and anti-malware software, actively scans the entire disk, looking for viruses or other malware. This sort of scanning can interfere with $PRODUCT_NAME's operation, if the software scans $PRODUCT_NAME's data files, or database. The anti-malware software interferes in two ways: (1) it opens $PRODUCT_NAME's data files, and holds them open while $PRODUCT_NAME is trying to write to them, during database builds, which causes Windows to refuse $PRODUCT_NAME write access to its own internal database files, and (2) if the malware detects a virus signature in one of $PRODUCT_NAME's database files, it may delete or modify that file, corrupting the database. The second scenario can occur even if there is no actual virus present, because $PRODUCT_NAME's database files are binary files, which can potentially contain any possible virus signature due to random permutations of the data; and worse, because $PRODUCT_NAME is often used to scan web logs, mail logs, and even antivirus logs which naturally contain virus signatures of the viruses which were encountered by the logging devices or servers.

Even when anti-virus scanning does not cause errors in $PRODUCT_NAME, it can greatly reduce the performance of $PRODUCT_NAME, as both fight for access to the same files. The performance impact can be 20 times or greater--a database which might normally takes 1 hour to build might take 20 hours or more.

The solution is to disable scanning of $PRODUCT_NAME's directories. Anti-malware should not be completely turned off--it is important to the security of your system, but most products can be selectively disabled, so they will not scan particular folders. In a default installation, $PRODUCT_NAME is found in the Program Files folder of the C: drive, so disabling scanning of the $PRODUCT_NAME folder there will greatly improve the performance and reliability of $PRODUCT_NAME.

` } # permission_denied dynamiccasterror = { label = "Relocation error: __dynamic_cast_2" question = 'When I try to run $PRODUCT_NAME, I get an error "relocation error: $PRODUCT_EXECUTABLE_DOCS: undefined symbol: __dynamic_cast_2". How can I fix this?' short_answer = "This is a GNU library incompatibility; build $PRODUCT_NAME from source instead of using the binary distribution." long_answer = `

This occurs on UNIX systems, and is due to $PRODUCT_NAME being built expecting a different version of the GNU libraries than the one you have on your system (libstdc++). In other words, this is an operating system incompatibility -- we're building on a different version than you're running on.

The best solution is to use the "encrypted source" version of $PRODUCT_NAME, rather than the binary distribution for your platform; i.e., choose "encrypted source" as the "operating system" when you're downloading $PRODUCT_NAME. This version requires that you have a C/C++ compiler installed on your system. Follow the instructions to build $PRODUCT_NAME from source -- it's easy. The resulting binary will run properly on your system

If you don't have a compiler installed, please contact $SUPPORT_EMAIL.

` } # dynamiccasterror ftplogsource = { label = "Downloading Log Data by FTP" question = "Can $PRODUCT_NAME be configured to automatically FTP log files from multiple servers, and add them daily to a database?" short_answer = "Yes." long_answer = "Yes; just select one of the FTP log sources when $PRODUCT_NAME asks you where your data is. $PRODUCT_NAME can FTP one or more log files from any FTP server, anonymously or with a username/password." } clientsecurity = { label = "Protecting Clients' Statistics" question = "Can $PRODUCT_NAME be configured to limit access to statistics, so that a customer can only see the statistics associated with their section of my web site?" short_answer = "Yes, you can password protect statistics in several ways." long_answer = "

Yes. $PRODUCT_NAME provides several ways to do this. In general, you will create a separate user for each client, and a separate profile for each client. Then you will configure their user to be non-administrative, and to have permission to access only their own profile. Finally, you will set up their profile to show only their data, either by pointing it only at their files, or (if their data is interleaved with other clients' data), by using log filters to discard all events from the log which don't belong to them.

" } scheduling = { label = Scheduling question = "Can $PRODUCT_NAME be configured to automatically analyze the access log for my site on a shared server once a day at a given time?" short_answer = "Yes, if you run it stand-alone, or if your server has a scheduling program." long_answer = "

It depends on your web server. If you run $PRODUCT_NAME as a stand-alone program (rather than as a CGI program) on your server, then you can use $PRODUCT_NAME's built-in Scheduler to do this. If you can't run it stand-alone or don't want to, then you can still set up automatic database builds if your server has its own scheduling program (like cron or Windows Scheduler).

" } excludeip = { label = "Excluding an IP Address or Domain" question = "How can I exclude hits from my own IP address, or from my organization's domain?" short_answer = "Add a Log Filter to exclude those hits." long_answer = "

One way to do this is to use a global filter in the statistics, and use \"!(hostname within '123.124.125.126')\", and this is often the first thing people try, but it's not the best choice. The speed of a statistics filter depends on the number of items checked, so if there are 100,000 IP addresses in your log file, and you check all 100,000, then $PRODUCT_NAME will take up to 100,000 times longer to generate each page. That is probably not what you had in mind. A much better option is to use the Log Filters.

Log filters are used to filter out or modify log data as it is being read (rather than filtering database data as it is being browsed, like the statistics filters). You can get to the Log Filters by clicking Show Config in the profiles list, and clicking the Log Filters category.

You want to create a filter that will reject any log entries whose hostname field is your IP address. If your IP address is 128.128.128.128, the filter you want is this:

The name of the field (\"hostname\" here) depends on your log data -- use the name that your log data uses. For instance, IIS W3C format calls the field c_ip, so for IIS you would use this:

You can get a list of the fields in your profile by running $PRODUCT_NAME from the command line with \"-p profilename -a llf\".

The next time you rebuild the database, hits from your IP address will be rejected, and will not appear in the statistics.

Rejecting all hits from a particular domain is very similar; if your domain is mydomain.com, and your server is set to look up IP addresses, then you can use this filter:

If your server logs hostnames as IP addresses (and does not resolve them to hostnames with DNS), you can use the subnet for your domain instead; for instance, if all hits from mydomain.com will come from the subnet 128.128.128, then you can use this filter:

" } clickstream = { label = "Clickstreams (Paths Through the Site)" question = "Can $PRODUCT_NAME show me the paths visitors took through my web site?" short_answer = "Yes; its \"session paths (clickstreams)\" report is very powerful." long_answer = "

Yes, very well. Most statistics packages will only show you the \"top paths\" or maybe the entry and exit pages; $PRODUCT_NAME shows you all the paths visitors took through the sites, in an easily navigated hierarchical report. You get complete data about every path that every visitor took through your site, click-by-click. For even more detail, you can zoom in on a particular session in the \"individual sessions\" report, to see the full log data of each click in the session.

" } resources = { label = "Resource Usage" question = "How much memory/disk space/time does $PRODUCT_NAME use?" short_answer = "It depends on how much detail you ask for in the database. It uses very little if you use the default detail levels." long_answer = "

Memory usage depends mostly on the complexity of your data set (not the size). If your database has fields with millions of unique values, it will use many megabytes for each of those fields. It's uncommon for any particular field to require more than 100M, but in extreme cases, fields can use over 1G.

Disk usage is roughly 200% and 300% the size of your uncompressed log data. In some cases, you may need 400% of your uncompressed data. So if you're processing 500 GB of log data, you'll need about 1500 GB of disk space to hold the database.

The time to process a dataset is roughtly proportional to the size of a database. As of 2004, on a moderately fast single-CPU system, $PRODUCT_NAME typically processes between 5,000 and 10,000 lines of log data per second.

" } largelogs = { label = "Processing Large Log Files" question = "How large of a log file can $PRODUCT_NAME process?" short_answer = "There are no limits, except those imposed by the limitations of your server." long_answer = "

There is no fundamental limit -- given enough memory, disk space, and time, you can process the world. We've processed log files terabytes in size, billions of lines long, and been able to browse their statistics at full complexity in real time, with no troubles.

" } logformats = { label = "Supported Log Formats" question = "What sorts of log files can $PRODUCT_NAME process?" short_answer = "$PRODUCT_NAME can handle all major log formats and many minor formats, and you can create your own custom formats." long_answer = "

$PRODUCT_NAME is not just for web server logs, though it's well suited to that task. $PRODUCT_NAME also supports firewall logs, proxy logs, mail logs, antivirus logs, network logs, FTP logs, and much more.

Click here for the full list of {=docs_chapter_link('logformats')=}.

It automatically detects all the formats it supports, and chooses appropriate settings for the format.

We're continually adding new log formats, so the list above will keep growing. However, due to the large number of format requests, we cannot add all the formats that are requested. If your log format is not recognized by $PRODUCT_NAME, and you need support for a format, we can add it to $PRODUCT_NAME for a fee; contact $SUPPORT_EMAIL for details.

If you want to analyze a log in a different format, $PRODUCT_NAME also lets you create your own format description file; once you've done that, your format becomes one of the supported ones--$PRODUCT_NAME will autodetect it and choose good options for it, just like any built-in format.

$PRODUCT_NAME's format description files are very flexible; almost any possible format can be described. If you have an unsupported format and you'd like help writing a format file, please contact $SUPPORT_EMAIL, and we'll write a format file for you, at no charge.

" } peakperiods = { label = "Peak Period Reports" question = "Does $PRODUCT_NAME do \"peak period\" reports (by weekday, or hour)?" short_answer = "Yes." long_answer = "

Yes. $PRODUCT_NAME lets you break your statistics down by any of a large number of criteria, and by more than one at a time. Among these criteria are \"day of week\" and \"hour of day,\" so you can see weekday or hour information just by adding the appropriate field to your database.

" } weeklystatistics = { label = "Weekly Statistics" question = "Can I see the number of hits per week? Can I see a \"top weeks\" report?" short_answer = "Yes, by using the Calendar, and/or creating a database field and a report tracking \"weeks of the year.\"" long_answer = "

The date/time field in $PRODUCT_NAME tracks years, months, days, hours, minutes, and seconds. Each of these units fits evenly into the larger unit (24 hours in a day, 12 months in a year, etc.). Because weeks do not fit evenly into months, $PRODUCT_NAME cannot easily fit weeks into the date/time hierarchy. Still, there are several ways to see weekly statistics.

One way is to use the Calendar. In the Calendar, each week is represented as a link called \"week\"-- clicking the link applies a filter to the date/time field that shows the hits on those seven days. This lets you zoom in on a particular week, so you can see the statistics for that week, or you can switch to other views to learn more about the activity for that week. However, if you do it that way you can't see a list or graph of weeks, with the hits for each week, the way you can for days in the \"Days\" report.

If you need a weekly graph or table, you need to track the \"week of the year\" log field in your database. The week of the year is a number between 1 and 52 that represents the week of the year (e.g. 1 means January 1 through January 8, etc.). You can track the week of the year field like this:

  1. Open the profile file ($PRODUCT_NAME/LogAnalysisinfo/profiles/profilename .cfg) you want to add week_of_year reports to in your favorite text editor (notepad).

  2. Search for \"database = {\", then search for \"fields = {\" and scroll down until you see \"day_of_week = {\"

  3. Copy this line and all lines until the line \"} # day_of_week\" and paste it all just underneath.

  4. Where you see day_of_week in the new section change it to week_of_year (except use \"string\" where you see \"display_format_type\"), so it becomes:

          day_of_week = {
            label = \"$lang_stats.field_labels.day_of_week\"
            type = \"string\"
            log_field = \"day_of_week\"
            display_format_type = \"day_of_week\"
            suppress_top = \"0\"
            suppress_bottom = \"2\"
            always_include_leaves = \"false\"
          } # day_of_week
          week_of_year = {
            label = \"$lang_stats.field_labels.week_of_year\"
            type = \"string\"
            log_field = \"week_of_year\"
            display_format_type = \"string\"
            suppress_top = \"0\"
            suppress_bottom = \"2\"
            always_include_leaves = \"false\"
          } # week_of_year
      

  5. Then search for \"reports = {\" and duplicate (by copy/paste as above) an existing report (the Day of week report is a good choice), and again where you see day_of_week in the new section change it to week_of_year (except use \"string\" where you see \"display_format_type\").

  6. Then search for \"reports_menu = {\" and then \"date_time_group = {\" and duplicate (by copy/paste as above) an existing report menu (the Day of week report is a good choice), and again where you see day_of_week in the new section change it to week_of_year (except use \"string\" where you see \"display_format_type\").

  7. Save the changes you have made .

  8. Rebuild the database.

The new report will show you traffic for each week of the year.

" } timeofday = { label = "Time of Day Statistics" question = "Does $PRODUCT_NAME do time of day?" short_answer = "Yes." long_answer = "

Yes, $PRODUCT_NAME can pinpoint your hits to the second. By default, it also breaks down hits by hour, so you can detect peak usage and other hourly information. The Log Detail report show complete information about each event, down to the second, so you can zoom in on any part of your statistics, and then zoom down to the level of the log data to see event-by-event second-by-second what occurred.

" } uniquevisitors = { label = "Unique Visitors" question = "Can $PRODUCT_NAME count unique visitors?" short_answer = "Yes, using unique hostname or using cookies." long_answer = "

Yes; $PRODUCT_NAME can tell you the number of unique visitors for any item in the database, including the number of visitors for a particular day, the number of visitors from a particular domain, the number of visitors who hit any particular page or directory, or any other type of data $PRODUCT_NAME can display.

By default, $PRODUCT_NAME uses the hostname field of your log data to compute visitors based on unique hosts. That works for all log files, but it's a somewhat inaccurate count due to the effect of proxies and caches. If your log data tracks visitors using cookies, you can easily configure $PRODUCT_NAME to use the cookie information instead, by changing the \"visitors\" database field so it is based on the cookie log field instead (in the Log Filters section of the profile Config). See also {=docs_faq_link('visitorcookies') =}.

" } visitorcookies = { label = "Counting Visitors With Cookies" question = "Can $PRODUCT_NAME count visitors using cookies, rather than unique hostnames?" short_answer = "Yes -- it includes a built-in log format to do this for Apache, and other servers can be set up manually." long_answer = "

Yes. The reason you'd want to do this is that using unique browsing hostnames (or IPs) to count visitors is an imprecise method, since the same actual visitor may appear to come from several hostnames -- the same person may dial up and receive random IP addresses, or in some extreme cases, their ISP may be set up so that they have a different IP address for each hit, or several actual visitors may appear as one hostname if they're all using the same proxy. The solution to this problem is to set your web server to use cookies to keep track of visitors. Apache and IIS can be configured to do this, and in both cases, $PRODUCT_NAME can be configured to use the cookie log field, instead of the hostname, as the basis for its \"visitor\" field. To do this, edit your profile (in LogAnalysisInfo/profiles) with a text editor, find the \"visitors\" database field (look for \"database = {\", then \"fields = {\", then \"visitors = {\"), and change the log_field value to your cookie field; for instance, if your cookie field is cs_cookie, change it to log_field = \"cs_cookie\". Note that this will only work if your entire cookie field tracks the visitor cookie, and does not track any other cookes; if you have multiple cookies, you can't use the whole cookie field as your visitor ID, and you need to use the approach described below to create a visitor_id field and use a regular expression to extract your visitor cookie into it, and then change log_field to visitor_id.

Installing the cookie tracking JavaScript

If your server or environment already tracks visitors by cookie, you can skip this section. If not, you need to add a bit of JavaScript to each of your web pages, to assign cookies to each visitor. To do this, copy the log_analysis_info.js file, from the Extras folder of your $PRODUCT_NAME installation, into a folder called js, in your web server root directory, and add this to every possible entry page (best to add it to every page):

Using Cookie-based Visitors IDs in Apache

In the case of Apache, it's even easier, because $PRODUCT_NAME includes a log format descriptor for a special \"combined format plus visitor cookie\" log format. The format is just normal combined format, with the visitor ID stuck at the front of each log entry. You can log in this format by adding the following lines to your httpd.conf file:

  CookieTracking on
  CookieExpires \"2 weeks\"
  CustomLog /var/log/httpd/cookie.log \"%{cookie}n %h %l %u %t \\\"%r\\\" %>s %b \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\"\"

(replace /var/log/httpd/cookie.log above with the pathname of the log you want to create). When you point $PRODUCT_NAME at this log file, it will recognize it as an \"Apache Combined With Visitor Cookies\" log, and it will set up the log filter described above for you, so you don't have to do any manual profile at all.

Using Cookie-based Visitors IDs in IIS

IIS has built-in support for visitor cookies -- just turn on logging of the Cookie field (extended property), or tell IIS to use \"W3C Extended Log File Format\" for logging, and you'll get cookies in your log data. Once you've done, that, you'll need to create a \"visitor_id\" log field to hold the cookie information, and use that field as the bases for your visitor database field.

An Example Filter For Extracting Cookies

If your cookie field contains more than just a visitor ID, you'll need to extract the visitor ID part of the field, and put it into a separate $PRODUCT_NAME's \"visitor id\" log field. This can be done using a regular expression filter with variable replacement. First, you'll need to create a visitor ID log field. You can do this by editing the profile .cfg file (in the profiles $lang_stats.directory of the LogAnalysisInfo $lang_stats.directory in your installation), and find the log.fields group (search for \"log =\" and then forward from there for \"fields =\"). Add the following log field:

  visitor_id = {
    label = \"visitor ID\"
    type = \"flat\"
  }

Next, in the same .cfg file, change database.fields.visitors.log_field to visitor_id (i.e. search for \"database =\", then search for \"fields =\", then search for \"visitors =\", and then set the log_field value within visitors to visitor_id), so the visitors field uses the visitor_id to determine whether two events are from the same visitor.

Then, add a log filter (in the Log Filters section of the profile Config, or in the log.filters section of the .cfg file) to extract the visitor ID from the cookie. For example example, suppose that the cookie field value looks like this:

  var1=value1&var2=value2&lavc=123456789&var3=value3

The lavc cookie (the visitor id, 123456789 in this case) is buried inside the field, surrounded by other cookie names and values. To extract it you need a filter that grabs the part after lavc= and before &. This can be done most easily with the following filter:

    if (matches_regular_expression(cookie, \"&lavc=([^&]*)&\")) then visitor_id = \\$1

(for IIS, the value in quotes will be ASPSESSIONID[A-Z]*=([^&]*). This filter finds a section of the field starting with &lavc=, followed by a series of non-& characters, followed by a &, and it sets the visitor id to the sequence of non-& characters it found (123456789, in this case).

Once you've added the visitor id log field, and the filter to set it, and modified the visitors database field to use the visitor id as its log field, rebuild the database. $PRODUCT_NAME is now using the lavc value from your cookie field as your visitor id, which should make your visitors counts more accurate.

" } robotstxt = { label = robots.txt question = "Why do I see hits on a file called \"robots.txt\" in my statistics?" short_answer = "robots.txt is a file that tells search engine spiders and robots what they can do, so a hit on robots.txt means that a spider visited your site." long_answer = "

robots.txt is a \"standard\" file that appears at the root level of many web sites to tell search engine robots what to do on the site. Robots, also known as spiders, are computer programs that attempt to systematically visit and catalog all the pages on the Web. robots.txt tells the robots what they can or can't do on the site (whether they can index the site, which pages they may not index, etc.). Any correctly written robot will hit that page first, and follow the instructions it finds there. So the hits you're seeing are from robots.

If you don't have a robots.txt file on your site, the robots don't actually get any information--they get a \"404 File Not Found\" error instead, which they generally interpret as \"index whatever you want.\"

" } # graph_options = { # label = "Changing graph options" # question = "How can I change the graphs to pie charts, or add a legend, or change which field is graphed?" # short_answer = "Edit the report element in the profile .cfg file -- see long answer for full instructions." # long_answer = "

To changing graphing options, #you need to edit the profile .cfg file. The file is in the profiles folder of the LogAnalysisInfo folder #of your $PRODUCT_NAME installation.

#

First, open that file with a text editor. Then search for reports = { to find the #beginning of the section which describes reports. There will be one group within that section for each report #in the profile. Find the group corresponding to the report you want to edit. For instance, the file_type report #would start like this:

#
#      file_type = {
#        report_elements = {
#          file_type = {
#            label = \"%7B=capitalize(pluralize(print(database.fields.file_type.label)))=}\"
#            type = \"table\"
#            database_field_name = \"file_type\"
#            sort_by = \"hits\"
#            sort_direction = \"descending\"
#            show_omitted_items_row = \"true\"
#            omit_parenthesized_items = \"true\"
#            show_totals_row = \"true\"
#            starting_row = \"1\"
#            ending_row = \"10\"
#            only_bottom_level_items = \"false\"
#            show_graph = \"false\"
#            columns = {
#              0 = {
#                type = \"string\"
#      ...
#
#

In most cases, there is a single report element for each report, but if you're editing a report like #the single page summary, there may be multiple groups in the report_elements group; find the group corresponding #to the table you're editing.

#

If graphing is off, as in this example, you can first turn it on by changing the show_graph line to #show_graph = true and adding a graph section:

#
#      file_type = {
#        report_elements = {
#          file_type = {
#            label = \"%7B=capitalize(pluralize(print(database.fields.file_type.label)))=}\"
#            type = \"table\"
#            database_field_name = \"file_type\"
#            sort_by = \"hits\"
#            sort_direction = \"descending\"
#            show_omitted_items_row = \"true\"
#            omit_parenthesized_items = \"true\"
#            show_totals_row = \"true\"
#            starting_row = \"1\"
#            ending_row = \"10\"
#            only_bottom_level_items = \"false\"
#            show_graph = true
#            graph = {
#              numerical_fields = {
#                hits = true
#              }
#            }
#            columns = {
#              0 = {
#                type = \"string\"
#      ...
#
#

This will add a graph to the report; by default it will be a bar chart with colored bars and a legend. #If you prefer a pie chart, use these graphing options instead:

#
#            show_graph = true
#            graph = {
#              pie_chart = true
#              numerical_fields = {
#                hits = true
#              }
#            }
#
#

If you'd like to graph a field other than \"hits\", change \"hits\" to the internal name of a numerical #database fields; e.g. \"page_views\" or \"events\" or \"messages\" or whatever the appropriate fields #are for your log format. You can get a list of available database field names by running $PRODUCT_NAME #from the command line with the options: -p profilename -a ldf.

#

You may also need to delete the ReportCache $lang_stats.directory in the LogAnalysisInfo $lang_stats.directory #for changes to take effect immediately; if you don't, reloading the report may load a cached version of the report #from before the changes.

. #" # } # report_filters = { # label = "Adding report filters" # question = "How can I add a filter which permanently applies to just one report, or report element?" # short_answer = "Add it in the profile .cfg file -- see long answer for full instructions." # long_answer = "

To add a filter to a single report, or a report element (table and graph) within a report, #you need to edit the profile .cfg file. The file is in the profiles folder of the LogAnalysisInfo folder #of your $PRODUCT_NAME installation.

#

First, open that file with a text editor. Then search for reports = { to find the #beginning of the section which describes reports. There will be one group within that section for each report #in the profile. Find the group corresponding to the report you want to edit. For instance, the file_type report #would start like this:

#
#  file_type = {
#    report_elements = {
#      file_type = {
#        label = \"%7B=capitalize(pluralize(print(database.fields.file_type.label)))=}\"
#        type = \"table\"
#        ...
#
#

To add a filter to the report, add an extra section within the report group, but outside the #report_elements group, like this:

#
#  file_type = {
#    filter = {
#      expression = \"(page within '/dir1/')\"
#    }
#    report_elements = {
#      file_type = {
#        label = \"%7B=capitalize(pluralize(print(database.fields.file_type.label)))=}\"
#        type = \"table\"
#        ...
#
#

This adds a filter to to the report so it will show only events where the page field is inside the #directory /dir1/. Any filter expression is permitted here; here are some other examples:

#
#      expression = \"(date_time > '01/Jan/2004 00:00:00')\"
#      expression = \"(date_time > '01/Jan/2004 00:00:00') and (page within '/dir1/')\"
#      expression = \"(date_time > '01/Jan/2004 00:00:00') and !(page within '/dir1/')\"
#
#

The first shows only hits since the beginning of 2004; the second shows only hits since the beginning of 2004 where #the hit was in /dir1/; the third shows only hits since the beginning of 2004 where the hit was not in /dir1/.

#

If you want the filter to apply to a particular report element but not the the whole report, you can add this same #type of \"filter =\" expression inside the report element, e.g.:

#
#  file_type = {
#    report_elements = {
#      file_type = {
#        label = \"%7B=capitalize(pluralize(print(database.fields.file_type.label)))=}\"
#        type = \"table\"
#        filter = {
#          expression = \"(page within '/dir1/')\"
#        }
#        ...
#
#

This creates a report where the filter applies to only that filter element. Most reports have only one report #element, but a report like Single-page summary may have several, and each can have its own filters.

#" # } favicon = { label = "favicon.ico" question = "Why do I see a hits on a file called \"favicon.ico\" in my statistics?" short_answer = "favicon.ico is a special icon file that Internet Explorer looks for when it first visits the site." long_answer = "

Recent versions of Microsoft Internet Explorer, Safari, and other web browsers have a feature that lets web site owners define an icon for their site, which will appear in the address bar, the Favorites menu, and other places. If you create an icon file called favicon.ico in a directory of your web site, then any page in that directory that is bookmarked will appear in the Favorites menu with your custom icon. The browser checks for this file whenever a bookmark is created, so if you don't have the file, it will show up as a 404 (file not found) link. As a side note, this is a good way to see who is bookmarking your site.

" } multi_column_reports = { label = "Adding columns to report tables" question = "How can I add additional columns to report tables, e.g. to add a single report which reports source IP, destination IP, source port, and destination port?" short_answer = "Edit the report in the profile .cfg file to add a new item to the columns group." long_answer = "

Edit the profile .cfg file, which is in the profiles folder of the LogAnalysisInfo folder. Look for \"reports = {\" to find the reports list. Look down until you find a report which shows a table for one of the fields you want, e.g. in the source_ip/destination_ip/source_port/destination_port example, you would look for the destination_port report (the actual name of this report, and of field values, will vary depending on your log format). The report will look something like this:

      destination_port = {
        report_elements = {
          destination_port = {
            label = \"\\$lang_stats.destination_port.label\"
            type = \"table\"
            database_field_name = \"destination_port\"
            sort_by = \"events\"
            sort_direction = \"descending\"
            show_omitted_items_row = \"true\"
            omit_parenthesized_items = \"true\"
            show_totals_row = \"true\"
            starting_row = \"1\"
            ending_row = \"10\"
            only_bottom_level_items = \"false\"
            show_graph = \"false\"
            columns = {
              0 = {
                type = \"string\"
                visible = \"true\"
                field_name = \"destination_port\"
                data_type = \"string\"
                header_label = \"%7B=capitalize(database.fields.destination_port.label)=}\"
                display_format_type = \"string\"
                main_column = \"true\"
              } # 0
              1 = {
                header_label = \"%7B=capitalize(database.fields.events.label)=}\"
                type = \"events\"
                show_number_column = \"true\"
                show_percent_column = \"false\"
                show_bar_column = \"false\"
                visible = \"true\"
                field_name = \"events\"
                data_type = \"int\"
                display_format_type = \"integer\"
              } # 2
            } # columns
          } # destination_port
        } # report_elements
        label = \"Destination report\"
      } # destination_port

There may be other columns, but the two shown here are a minimum -- one for the destination port field, and one for the \"events\" field (might be called \"packets\" or something else). This describes a report which has two columns: destination port and number of events.

To add a four-column source_ip/destination_ip/source_port/destination_port report, copy the entire thing and change the name to custom_report. Then duplicate the destination_port column three times, and edit the copies so they're source_ip, destination_ip, and source_port. The result:

      custom_report = {
        report_elements = {
          custom_report = {
            label = \"Custom Report\"
            type = \"table\"
            database_field_name = \"destination_port\"
            sort_by = \"events\"
            sort_direction = \"descending\"
            show_omitted_items_row = \"true\"
            omit_parenthesized_items = \"true\"
            show_totals_row = \"true\"
            starting_row = \"1\"
            ending_row = \"10\"
            only_bottom_level_items = \"false\"
            show_graph = \"false\"
            columns = {
              source_ip = {
                type = \"string\"
                visible = \"true\"
                field_name = \"source_ip\"
                data_type = \"string\"
                header_label = \"%7B=capitalize(database.fields. source_ip.label)=}\"
                display_format_type = \"string\"
                main_column = \"true\"
              } # source_ip
              destination_ip = {
                type = \"string\"
                visible = \"true\"
                field_name = \"destination_ip\"
                data_type = \"string\"
                header_label = \"%7B=capitalize(database.fields. destination_ip.label)=}\"
                display_format_type = \"string\"
                main_column = \"true\"
              } # destination_ip
              source_port = {
                type = \"string\"
                visible = \"true\"
                field_name = \"source_port\"
                data_type = \"string\"
                header_label = \"%7B=capitalize(database.fields. source_port.label)=}\"
                display_format_type = \"string\"
                main_column = \"true\"
              } # source_port
              destination_port = {
                type = \"string\"
                visible = \"true\"
                field_name = \"destination_port\"
                data_type = \"string\"
                header_label = \"%7B=capitalize(database.fields.destination_port.label)=}\"
                display_format_type = \"string\"
                main_column = \"true\"
              } # destination_port
              1 = {
                header_label = \"%7B=capitalize(database.fields.events.label)=}\"
                type = \"events\"
                show_number_column = \"true\"
                show_percent_column = \"false\"
                show_bar_column = \"false\"
                visible = \"true\"
                field_name = \"events\"
                data_type = \"int\"
                display_format_type = \"integer\"
              } # 2
            } # columns
          } # custom_report
        } # report_elements
        label = \"Custom report\"
      } # custom_report

Finally, add it to the reports_menu list (again, this is easiest to do by duplicating the existing reports_menu item for destination port), like this:

          custom_report = {
            type = \"view\"
            label = \"Custom Report\"
            view_name = \"custom_report\"
            visible = \"true\"
            visible_if_files = \"true\"
          } # custom_report

And you should have a Custom Report item in your reports menu, which links to the multi-column report.

If you're creating a two-column report, you can get an indented layout with subtables (rather than a \"spreadsheet\" layout) by adding the following section to the report group (e.g. right above the \"} # custom_report\" line, above):

            sub_table = {
              ending_row = \"10\"
              omit_parenthesized_items = \"true\"
              show_omitted_items_row = \"true\"
              show_averages_row = \"false\"
              show_totals_row = \"true\"
            } # sub_table

This sub_table node will work only for reports which have exactly two non-numerical columns (e.g. source_ip/destination_ip).

" } graph_field = { label = "Changing the graph field" question = "How do I change the field which is graphed, e.g. from page view to bandwidth?" short_answer = "Edit the profile .cfg file, and change the field name in the numerical_fields section of that report element." long_answer = `

If you want to change the field which is graphed, in the graph above a particular report table, do this:

  1. Open the profile .cfg file (in the profiles $lang_stats.directory of the LogAnalysisInfo $lang_stats.directory) in a text editor.

  2. Find the Reports section (Search for "reports = {")

  3. Scroll down until you see the report you want to change, for example "Days", so look for "days = {"

  4. A few lines below that find the line that says "graph = {". You should see this:

    numerical_fields = {
      hits = "true"
    } # numerical_fields
    
  5. Change this so that it reads:

    numerical_fields = {
      visitors = "true"
    } # numerical_fields
    

    You can substitute any numerical field name here, so page_views/hit/visitors/bytes etc (you must use the internal name for the field, not the "display" label).

  6. Refresh the browser to see the new graph.

NOTE: In some cases, just refreshing the browser may not actually show the new graph. You can be sure that once these changes have been made $PRODUCT_NAME will be producing the new graph, it is the browsers job to show you it. You may need to empty your browsers cache to be emptied for this to be seen.

` } saveasnew = { label = "Saving filters during Save as New Report" question = "When I'm saving a report for the first time but what about my filters?" short_answer = "If you have no filters active, then they will not be saved with your report." long_answer = `

You have created a new report, when you select "Save as New Report, under the "Miscellaneous" button, if you have no active date or general filters active, then you will not be saving those with this report. Those selections will be dimmed out in the dialogue box. If you want filters turned on, select those in the "Filters" menu and then save your report.

` } shortlongterm = { label = "Short- and Long-term Databases" question = "How do I get high detail for recent hits, and also long-term statistics, without using too much disk space?" short_answer = "Use two databases, one for high-detail short-term data, and one for low-detail long-term data." long_answer = "A common problem encountered by $PRODUCT_NAME users is the conflict between wanting to see all possible statistics, and wanting to be able to generate the database in a reasonable amount of time, memory, and disk space. With large web sites, it is often impossible to have both of these things--you probably don't have the computing power to fully analyze your multi-gigabyte logs and still have full detail on ten database fields. One easy solution to this problem is to have two databases (two profiles).

The first, the \"all data\" profile, represents your entire log but with significant limitations on the fields (perhaps only the top two levels of the hostname field, the top two levels of the referrer, date/time to the day level only, etc.).

The second, the \"recent data\" profile, includes a filter to discard all log entries older than a day, or a week, or a month, depending on how recent you want it and how much data you get in a day. The \"recent data\" profile, since it has fewer log entries in it, can have much more detailed information; the date/time can go to the second level, all the hostnames can be there (for full visitor information), etc.

Using this technique, you'll end up with two views of your log data. For long-term trends, you can use the \"all data\" profile. For recent access information, in detail, you can use the \"recent data\" profile. The combined size and processing time of the two profiles will be much lower than if they were combined into one with the duration of \"all data\" and the depth of \"recent data.\"" } zoomfarther = { label = "Zooming Further" question = "How do I see more levels of statistics (i.e. how can I zoom in further)?" short_answer = "Increase the \"suppress below\" level for this database field in the profile options." long_answer = "

$PRODUCT_NAME limits the number of levels you see by default to save memory, disk space, and time. You can increase the levels on any database field like this:

  1. Using a text editor, open the .cfg file for your profile, in the LogAnalysisInfo/profiles folder.

  2. Find the database = { section.

  3. Within that section, find the fields = { section.

  4. Within that section, find the database field you want to change.

  5. Increase the suppress_below value for that field.

  6. Save the file.

  7. Rebuild the database.

Then you'll be able to see as many levels as you chose. See also {=docs_chapter_link('resources')=}.

" } zoomsingle = { label = "Zooming on single files" question = "How do I see the number of downloads for a particular file (i.e. a newsletter PDF, or a template file PDF)?" short_answer = "Select PDF from the 'File Types' table and then use the Zoom Menu to Zoom to the URL's report, then Select the PDF you need to get an overview of that file. " long_answer = "

Click on the 'Content' report group from the left hand menu, then click on the 'File Types' report. When the File Types report loads, click on 'PDF' from the table and the table will re-load with just a PDF entry and a menu will appear above the table with a list of all tables in it.

From that drop down ({=docs_user_chapter_link('user_report_zoom_to')=}) select the 'Pages' or 'URL's' (it could be either) option and you should then see a page load that has only pages/URL's in where the file type is PDF. You can then select the PDF from that list, and you would next see an Overview for that file only.

This type of filtering uses the {=docs_user_chapter_link('user_filter_zoom')=}, they are temporary filters that are applied on the report(s) as you click about (Zoom about) the report. By clicking on any item from the left hand menu they are cancelled and you are returned to that reports default view where there are no filters set (unless the default has a filter set via the Report Editor, in which case that filter set will be applied).

If you want to filter items in the report, have it apply to the whole report and be able to turn on the filter when you need to, it is better to use the {=docs_user_chapter_link('user_filter_global')=} that are available from the Filter Icon in the Toolbar (just above the report). These can be created and enabled and disabled as you need them, and you only need to create them once and they are stored under your username and the profile you are using for use next time you need them, Zoom filters are not stored anywhere and need re-applying each time you need the filter set.

" } datatypes = { label = "Definitions of Numerical Fields" question = "In web server analyses, what are \"hits,\" \"page views,\" \"bandwidth\" or \"bytes,\" \"visitors,\" or \"sessions\"? In media analyses, what are \"stream duration,\" \"play duration,\" \"pause duration,\" \"session duration,\" \"events,\" \"streams,\" or \"concurrent connection,\" or \"successful accesses\"?" short_answer = "Hits are accesses to the server; page views are accesses to HTML pages; visitors are unique visitors to the site, and sessions are visits to the site. Play duration is the most useful measure of time actually spent playing; pause duration is time spent paused; stream and session duration are the time spent connected; events is the total number of log lines; stream is the unique number of streams accessed; successful accesses are the number of non-error streaming events." long_answer = `

Web Server Numerical Fields

$PRODUCT_NAME can count web log traffic in several ways. Each way is counted independently of the others, and each has its own advantages in analyzing your traffic. The different types are:

Media Server Numerical Fields

Media servers have their own distinct numerical fields. Some of these are directly from the log data; others are computed by the $PRODUCT_NAME plug-in. Different plug-ins report different fields; fields may include: