{= include("docs.util"); start_docs_page(docs.technical_manual.page_titles.newsletters); =}
![]() |
Sawmill Newsletter June 15, 2008 |
When using Sawmill to generate reports from a forward proxy server,
you will often want to know what person was responsible for a
particular bit of traffic. Almost all proxy servers log the IP address
of the internal computer, but some do not log the username, or actual
name, of the person using that IP address, at the time of the access.
This article discusses ways to determine the person responsible for the
traffic.
Method 1: Use a proxy server which logs the username
Ideally, the proxy server should simply log the username on each line, showing which user was responsible for the access. Technically, that can be accomplished by having the proxy server query a local LDAP or Open Directory server, to determine the username associated with the IP address at the time of the request. This is the optimal solution, because the proxy server is in the ideal position to do the query. All other solutions must be done after the fact, when the IP address may no longer correspond to the same username in the authentication server. So if your proxy server logs the username, you're done--Sawmill will report it. If your proxy server is not currently logging the username, see if it can be configured to compute and log the username at the time of the event. If it cannot, contact the proxy server vendor to see if such a function can be enabled, or can be added.
Method 2: Use a CFG file to map IP addresses to usernames
In the real world, your proxy server may not log usernames, and you
may not have any way to make it log usernames. A simple alternative,
though not usually as accurate, is to give Sawmill a CFG file which
includes a list of all IP addresses, and the usernames they map to (see
the December 2006 newsletter for a discussion of creating and using CFG
files). Once you have the CFG file, Sawmill will be able to tag every
line of log data with the username (based on the IP address in that
line), and you'll get a Usernames report and field, just as though the
device had logged the username. This is perfect if your IP-to-username
mappings never change, and can be automated using a simple script to
query the LDAP server and generate the CFG file.
Method 3: Use a timestamped CFG file to map IP addresses to usernames based on time
Method 2 works if IP-to-username mappings never change, but in many
environments they do. You might have a DHCP environment where multiple
users share the same IP addresses; or you might have shared
workstations where one person might be sitting at a particular
workstation one day, and another person might be sitting there another
day. A simple IP-to-username map cannot capture this complexity, and
you would need to choose which single username corresponds to
each IP. If the environment is largely stable, that could still be
fairly effective, but it won't work well for environments where most of
the IPs are dynamic.
If the IPs are highly dynamic, you can use a CFG file which contains
timestamp information for each IP-to-username mapping. The CFG
file could include information about which time range a particular
mapping was valid, and the log filter which converts IPs to usernames
could use the timestamp of the log line to choose which username to
use. In the simplest form, the CFG file could just have multiple dumps
of the authentication server, each one under a timestamped subnode in
the CFG file, and the log filter could find the closest dump to the log
line's timestamp, and use the mappings from there. If the
authentication
dumps are frequent enough, this can approach the accuracy of Method 1.
However, the additional overhead of managing the timestamps and the
much larger CFG file can make log processing much slower with this
approach, versus Method 2 (and with Method 1, there is no overhead at
all).
More sophisticated layouts of the CFG file are also possible,
including methods which eliminate redundancy by using a hierarchical
structure. The precise details of the possible structures of the CFG
file, and the log filter which parses it, are beyond the scope of this
article, but if you need assistance implementing this approach, we can
help you; please contact consulting@flowerfire.com.
Method 4: Query the authentication server directly from the log
filter
This method dispenses with CFG entirely, instead having the log
filter query the authentication server each time it needs to know the
username associated with a particular IP address. Salang (the language
of log filters) does not have direct support for querying
authentication servers, so this must be done by running an arbitrary
command (with exec()) to do the lookup. Furthermore, because it takes a
significant fraction of a second to exec() a script to do the lookup,
this cannot efficiently run for every line of log data, so an
additional level of caching (using an in-memory node) should be added
to make it fast.
This method (4) suffers from a possible accuracy problem similar to
Method 2: it uses the IP-to-username mapping as of the log
processing time, which may not be the same as the mapping at the
time of the access. So it is effective for environments where users do
not typically move between IP addresses, but will give incorrect
usernames for some events in some cases, in environments where users
move.
The details of the authentication query script is beyond the scope
of this article (in a nutshell, it could accept the IP address on the
command line, query the authentication server, and write the username
to a file to be read and cached by the log filter with read_file()),
but
if you need assistance implementing this approach, we can help you;
please contact consulting@flowerfire.com.
Summary
For perfect auditing (certain determination of which username was
responsible for an access), Method 1 is the best, as it determines the
username at the time of access. Method 3 can approach the accuracy of
Method 1, but is still not perfectly precise, due to the high
granularity of authentication server dumps; it can be made perfect if
dumps can be arranged to occur with every change in the
authentication database. Method 2 is a simple and fast approach,
suitable for environments where IPs are generally stable, or where
exact IP-to-username correlations are not required. Method 4 is similar
to Method 2, but does its work in real time--it eliminates the need for
a CFG file, but introduces the need for a querying script.