Analyse Apache traffic with asql log interface

Get meaningful stats/data from your apache log files easily

Posted by on

tl;dr

The asql command allows you to load up your apache logs into a temporary SQLite database and query the log files as you would any normal SQL database. It provides a both interactive prompt and the ability to be used as a script interface.

Installation

If you’re running Ubuntu you can install asql with the following command:

sudo apt-get install asql

Getting started with ASQL

You can start up asql interactive mode by simply typing asql

asql v1.6 - type 'help' for help.
asql>

From the prompt type help for more info

asql> help
asql v1.6
The following commands are available within this shell:

     alias - Define, or view, persistent aliases.
     alter - Run an ALTER query against the database.
    create - Run a CREATE query against the database.
    delete - Run a DELETE query against the database.
      drop - Run a DROP query against the database.
      exit - Exit the shell.
      help - Show general, or command-specific, help information.
    insert - Run an INSERT query against the database.
      load - Load an Apache logfile.
      quit - Exit this shell.
   restore - Load a previously save'd temporary database.
      save - Save the temporary database.
    select - Run a SELECT query against the database.
      show - Show the structure of the database.
    update - Run an UPDATE query against the database.

For command-specific help run "help command".

First up we want to load a log file. To do this we can run load /path/to/my.log Lets try this now.

asql> load /var/log/apache2/access.log
Loading: /var/log/apache2/access.log

You can also use wildcards like /var/log/apache2/access.log* asql will also automatically load any files compressed with *.gz and bzip2.

It will load all the log files into a table called logs

Now lets try our first query

asql> SELECT COUNT(*) FROM logs;
22456

You can see the fields available to you by running show

show

   The table 'logs' table has the following columns:

  id      - ID of the request
  source  - IP, or hostname, which made the request.
  request - The HTTP request
  status  - The HTTP status-code returned
  size    - The size of the response served, in bytes.
  method  - The HTTP method invoked (GET, PUT, POST etc).
  referer - The HTTP referer (sic).
  agent   - The User-Agent which made the request.
  version - The HTTP version used by this client.
  date    - The date and time at which the request was made.
  label   - Any label applied when the logfile was read.
  user    - The remote (authenticated) user, if any.

Lets look at some more practical queries.

Useful queries

This request is in the docs, it shows all the clients connecting to your webserver and the size of files/requests that they have downloaded in total.

SELECT source, SUM(size) AS number FROM logs GROUP BY source ORDER BY number DESC, source

Find the top 10 client addresses by hits

SELECT source, COUNT(id) hits FROM logs GROUP BY source ORDER BY hits DESC LIMIT 10;

Find the largest 20 HTML files requested

SELECT DISTINCT(request), size FROM logs WHERE SUBSTR(request, LENGTH(request)-3) = "html" ORDER BY size DESC LIMIT 20;

Show average hits per hour

SELECT hour, ROUND(AVG(number), 0) FROM (SELECT STRFTIME("%Y-%m-%d %H", date) date_hour, STRFTIME("%H", date) hour, COUNT(id) number FROM logs GROUP BY date_hour) GROUP BY hour;

You get the idea.

Feel free to experiment, and don’t forget to look at man asql as there are some useful examples in the docs. Post any queries you think would be useful in the comments.