Getting Meaningful Statistics from Apache Log Files
Updated 25 April 2002
If you run a Web site, chances are that at some time, you’ve decided that you would really like to get a sense of who has been visiting it. The best way to do that is to analyze the log entries, and we’ve posted a couple of articles on Web log analysers (see here, and here).
The problem is, Apple ships Apache in a state that makes its logs, well, pretty darn useless. By default, Apple has their Performance Cache turned on, and their log file format set to:
CustomLog "/private/var/log/httpd/access_log" "%h %l %u %t \"%r\" %>s %b"
That line means that access_log lines will look like the following:
66.92.146.189 - - [11/Apr/2002:14:04:46 -0400] "GET /manual/images/index.gif HTTP/1.1" 200 1540
(for an explanation of the variables used in the log line format, see http://httpd.apache.org/docs/mod/mod_log_config.html)
It’s not immediately apparent, but this line contains very little useful information. The “remote host” (%h) variable doesn’t get substituted with the actual IP address or hostname of the machine where the request originated—it gets replaced with the IP address of your Web server. There’s no mention of the referring URL or the browser type, either.
Much more valuable, from our perspective, is Apache’s “combined” log file format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Look in your /etc/httpd/httpd.conf file for that line. The combined log file format is rich with information. Log file lines with that format look like this:
207.91.53.243 - - [11/Apr/2002:14:07:19 -0400] "GET /images/dotbar.png HTTP/1.0" 304 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9) Gecko/2002 0311 via proxy gateway CERN-HTTPD/3.0 libwww/2.17"
As you can see, there’s much more information in a “combined” logfile style entry.
So how do you take advantage of this format? It’s quite simple, really. First, bring up Server Admin. Switch to the Internet tab, and wait for the Web service icon to stabilize. It should look like the image below.
Click on the globe, and select Configure Web Service. Click the Sites tab. Now, for each site you host, you will need to select the site and click Edit... On the window that pops up, deselect “Enable performance cache” and click Save. Ignore the alert about changes not taking effect. You must turn performance cache off for all the sites you host, otherwise it will still screw up the log files. Once you’ve committed all the changes, you can quit Server Admin.
Now for the fun stuff! Open up a Terminal session, and assume super-user rights (if you don’t have the root account enabled, just prefix all the following commands with “sudo”; e.g., “sudo emacs /etc/httpd/httpd.conf”)
First, we need to make sure that the combined log file format is enabled. Edit /etc/httpd/httpd.conf in your favorite editor, and search for the line
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\"" combined
Make sure that there’s no hash mark (octothorpe, “#") at the start of the line. Save that file.
Bring /etc/httpd/httpd_macosxserver.conf up in your favorite editor. This is the file that Server Admin maintains, and we’re going to completely disregard Apple’s warnings and make changes to it. If you are faint of heart, make a backup copy before you make any changes to this file.
We’re going to change all the CustomLog lines for the remaining sites we manage. The easiest way to do that is to do a global replacement, searching on
CustomLog "/private/var/log/httpd/access_log" "%h %l %u %t \"%r\" %>s %b"
and replacing it with
CustomLog "/private/var/log/httpd/access_log" combined
Once that’s done, save the file and issue the command
apachectl configtest
This command asks Apache to examine its configuration files and report any problems. You should see output like this:
[darius:/etc/httpd] root% apachectl configtest [Thu Apr 11 15:17:19 2002] [warn] module mod_hfs_apple.c is already added, skipping [Thu Apr 11 15:17:19 2002] [warn] module mod_redirectacgi_apple.c is already added, skipping Syntax OK
The two (there may be more depending on your site configuration) warning lines are new to Mac OS X and the April 2002 Security Update, and can be ignored as long as the last line is “Syntax OK". Start Apache with the command
apachectl graceful
and you should see output like this:
[darius:/etc/httpd] root% apachectl graceful /usr/sbin/apachectl graceful: httpd not running, trying to start [Thu Apr 11 15:18:55 2002] [warn] module mod_hfs_apple.c is already added, skipping [Thu Apr 11 15:18:55 2002] [warn] module mod_redirectacgi_apple.c is already added, skipping /usr/sbin/apachectl graceful: httpd started
Check your log file format by running the command tail -f /var/log/httpd/access_log and going to your Web site. You should see the new, fancier, and much more useful log entry format!
25 April 2002: Astute reader Mark Edwards points out that changing the “Preferences” site information in /etc/httpd/httpd_macosxserver.conf does not make new sites created with Server Admin follow the changed settings, despite the verbiage in the file itself. That section of the article has been removed.