Chapter 17. WWW Reports

Table of Contents

Supported Log Format
Common Log Format
Combined Log Format
CLF With mod_gzip Extensions
Referer Log Format
Logs With Virtual Host Information
W3C Extended Log Format
Report Descriptions and Configuration
Bytes By Period WWW Report
Bytes Per Directory WWW Report
Bytes By HTTP Result By Period WWW Report
Bytes By HTTP Result WWW Report
Bytes Per Request WWW Report
Client Hosts By Period WWW Report
Search Engines with Keywords Report
Requests By Browser WWW Report
Number of Requests By Period WWW Report
Requests By Browser Language WWW Report
Requests By HTTP Method WWW Report
Requests By OS WWW Report
Requests By Result By Period WWW Report
Requests By HTTP Result WWW Report
Requests By Gzip Result WWW Report
Requests By Robot Report
Requests By Top Level Domain Report
Requests By Attack Report
Requests By Keywords Report
Requests By User Agent WWW Report
Requests By Search Engines Report
Number of Requests By Size WWW Report
Number of Requests By Timeslot WWW Report
Requests By HTTP Protocol Version WWW Report
Average Compression By File Type WWW Report
Most Averaged Compressed Requested File WWW Report
Top Client By HTTP Result WWW Report
Top Client by Size WWW Report
Top Client WWW Report
Last Pages By Session WWW Report
First Pages By Session WWW Report
Most Travelled Referer -> Page Connections WWW Report
Top Referring Pages WWW Report
Top Referring Pages By Requested Page WWW Report
Top Referring Sites WWW Report
Most Requested Pages WWW Report
Top Traversals WWW Report
Top URLs By HTTP Result WWW Report
Most Requested URLs By Client Host WWW Report
User Sessions By Period WWW Report
Recurring Visitors WWW Report
Visit times User Session WWW Report
Page Counts User Session WWW Report
Filter Descriptions and Configuration
Select URL Filter
Select Sessions by Page Filter
Select Client Host Filter
Exclude URL Filter
Exclude Sessions by Page Filter
Exclude Client Host Filter
Exclude Referer Filter

Supported Log Format

The WWW superservice supports four log file formats which makes it possible to support a wide range of web servers like Apache™, IIS or Boa™.

Common Log Format

Common Log Format (CLF) is a standard log format that was originally implemented in the CERN httpd web server but that is supported nowadays by most web servers. Apache™, IIS and Boa™ can be configured to log in that format.

The Common Log Format has the following format:

remotehost rfc931 authuser [date] "request" status bytes
	    

where the fields have the following meaning:

remotehost

The host that made the request. This can be given as an IP address or a hostname.

rfc931

The result of an ident lookup on the host. This is usually not used.

authuser

The authenticated username.

date

The timestamp of the request.

request

The first line of the HTTP request. Usually in the format "method file protocol".

status

The result status of the request. i.e. 200, 301, 404, 500.

bytes

The size of the response sent back to the client.

Example of log lines in Common Log Format :

127.0.01 - - [11/03/2001 12:12:01 -0400] "GET / HTTP/1.0" 200 513
dsl1.myprovider.com - francis [11/03/2001 12:14:01 -0400] \
"GET /secret/ HTTP/1.0" 200 1256
	    

Combined Log Format

The combined log format is an extension to the Common Log Format. It adds informations about the user agent and referer. It is also known as the extended common log format. It was first implemented in the NSCA httpd web server but is now supported in many web servers. Apache™ can be configured to use this log format.

Two fields are added at the end of the common log lines:

"referer" "useragent"

referer

The content of the Referer header of the request. This usually reflects the page the user visited before this request.

useragent

The content of the User-Agent header of the request. This usually reflects the browser that the user is using.

CLF With mod_gzip Extensions

Mod_gzip is another extension to the common log format. It is used by the mod_gzip Apache™ extension which can be used to compress the result of requests before sending them to the client.

mod_gzip is a module developed by RemoteCommunications, Inc. Sourcecode is freely available from http://www.RemoteCommunications.com/apache/mod_gzip/mod_gzip. More informations can be found in their FAQ.

mod_gzip can log information about the compression of pages. To enable this, one can configure Apache™ to log using the 'gzip' format which can be defined as follows:

LogFormat "%h %l %u %t \"%r\" %>s %b %{mod_gzip_result}n \
          %{mod_gzip_compression_ratio}n" gzip
	    

This adds two fields at the end of each common log line:

gzip_result compression_ratio

gzip_result

The gzip result code. Usually OK.

compression_ratio

The ratio by which the content was compressed. A number from 0 to 100.

Referer Log Format

The Referer log format is an old format that was implemented in the NSCA httpd server. It was used to log information about the request's referer in a separate log file. The combined log format has made this log format obsolete.

Referer log files have the following format:

uridocument

uri

The referring URI. This is the content of the Referer header of the request which usually reflects the page where the user was before that request.

document

The local document that was referenced by that URI. This is the requested file without any query string.

Logs With Virtual Host Information

You may encounter log files that have a field containing the virtual host for which the requests was at the beginning of the line. The rest of the line is usually in the common or combined log format. This kind of logging is typically seen on webservers hosting several virtual servers.

Example of such a line:

www.example.com 1.7.2.21 - - [13/Oct/2000:10:30:16 +0200] \
    "GET / HTTP/1.0" 200 83
	    

Although Lire™ doesn't directly support such logs, it is easy to split those logs into many log files in the common or combined log format which can subsequently be processed by Lire™.

Example doing this in a shell:

$  mkdir apache-common.log
$  (while read virt rest; do echo $rest >> \
 apache-common.log/$virt; done) < /var/log/apache/common.log
$  for f in apache-common.log/*; do \
 lr_log2mail -s "$f" common joe@example.com < $f; done