[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20. Regression testing

The standard purpose of regression testing is to avoid getting the same bug twice. When a bug is found, the programmer fixes the bug and adds a test to the test suite. The test should fail before the fix and pass after the fix. When a new version is about to be released, all the tests in the regression test suite are run and if an old bug reappears, this will be seen quickly since the appropriate test will fail.

The regression testing in GNU Go is slightly different. A typical test case involves specifying a position and asking the engine what move it would make. This is compared to one or more correct moves to decide whether the test case passes or fails. It is also stored whether a test case is expected to pass or fail, and deviations in this status signify whether a change has solved some problem and/or broken something else. Thus the regression tests both include positions highlighting some mistake being done by the engine, which are waiting to be fixed, and positions where the engine does the right thing, where we want to detect if a change breaks something.

20.1 Regression testing in GNU Go  Regression Testing in GNU Go
20.2 Test suites  Test Suites
20.3 Performing tests  Performance Testing
20.4 HTML Regression Views  HTML Views


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20.1 Regression testing in GNU Go

Regression testing is performed by the files in the `regression/' directory. The tests are specified as GTP commands in files with the suffix `.tst', with corresponding correct results and expected pass/fail status encoded in GTP comments following the test. To run a test suite the shell scripts `test.sh', `eval.sh', and regress.sh can be used. There are also Makefile targets to do this. If you make all_batches most of the tests are run.

Game records used by the regression tests are stored in the directory `regression/games/' and its subdirectories.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20.2 Test suites

The regression tests are grouped into suites and stored in files as GTP commands. A part of a test suite can look as follows:

 
# Connecting with ko at B14 looks best. Cutting at D17 might be
# considered. B17 (game move) is inferior.
loadsgf games/strategy25.sgf 61
90 gg_genmove black
#? [B14|D17]

# The game move at P13 is a suicidal blunder.
loadsgf games/strategy25.sgf 249
95 gg_genmove black
#? [!P13]

loadsgf games/strategy26.sgf 257
100 gg_genmove black
#? [M16]*

Lines starting with a hash sign, or in general anything following a hash sign, are interpreted as comments by the GTP mode and thus ignored by the engine. GTP commands are executed in the order they appear, but only those on numbered lines are used for testing. The comment lines starting with #? are magical to the regression testing scripts and indicate correct results and expected pass/fail status. The string within brackets is matched as a regular expression against the response from the previous numbered GTP command. A particular useful feature of regular expressions is that by using `|' it is possible to specify alternatives. Thus B14|D17 above means that if either B14 or D17 is the move generated in test case 90, it passes. There is one important special case to be aware of. If the correct result string starts with an exclamation mark, this is excluded from the regular expression but afterwards the result of the matching is negated. Thus !P13 in test case 95 means that any move except P13 is accepted as a correct result.

In test case 100, the brackets on the #? line is followed by an asterisk. This means that the test is expected to fail. If there is no asterisk, the test is expected to pass. The brackets may also be followed by a `&', meaning that the result is ignored. This is primarily used to report statistics, e.g. how many tactical reading nodes were spent while running the test suite.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20.3 Performing tests

./test.sh blunder.tst runs the tests in `blunder.tst' and prints the results of the commands on numbered lines, which may look like:

 
1 E5
2 F9
3 O18
4 B7
5 A4
6 E4
7 E3
8 A3
9 D9
10 J9
11 B3
12 C6
13 C6

This is usually not very informative, however. More interesting is ./eval.sh blunder.tst which also compares the results above against the correct ones in the test file and prints a report for each test on the form:

 
1 failed: Correct '!E5', got 'E5'
2 failed: Correct 'C9|H9', got 'F9'
3 PASSED
4 failed: Correct 'B5|C5|C4|D4|E4|E3|F3', got 'B7'
5 PASSED
6 failed: Correct 'D4', got 'E4'
7 PASSED
8 failed: Correct 'B4', got 'A3'
9 failed: Correct 'G8|G9|H8', got 'D9'
10 failed: Correct 'G9|F9|C7', got 'J9'
11 failed: Correct 'D4|E4|E5|F4|C6', got 'B3'
12 failed: Correct 'D4', got 'C6'
13 failed: Correct 'D4|E4|E5|F4', got 'C6'

The result of a test can be one of four different cases:

If you want a less verbose report, ./regress.sh . blunder.tst does the same thing as the previous command, but only reports unexpected results. The example above is compressed to

 
3 unexpected PASS!
5 unexpected PASS!
7 unexpected PASS!

For convenience the tests are also available as makefile targets. For example, make blunder runs the tests in the blunder test suite by executing eval.sh blunder.tst. make test runs all test suites in a sequence using the regress.sh script.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20.4 HTML Regression Views

Extremely useful HTML Views of the regression tests may be produced using two perl scripts `regression/regress.pl' and `regression/regress.plx'.

  1. The driver program (regress.pl) which:
  2. The interface to view the captured output (regress.plx) which:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

20.4.1 Setting up the HTML regression Views

This documentation assumes an apache configured as per Debian's Apache 1.3 distribution, but it should be fairly close to the config for other distributions.

First, you will need to configure Apache to run CGI scripts in the directory you wish to serve the html views from. To do this, add the following to `/etc/apache/httpd.conf' (or to a user-specific conf file if applicable):

 
<Directory /path/to/script/>
    Options +ExecCGI
</Directory>

This allows CGI scripts to be executed in the directory used by regress.plx. Next, you need to tell Apache that `.plx' is a CGI script ending. Your `httpd.conf' file should contain a section <IfModule mod_mime.c>. Within that section, there may or may not be a line:

AddHandler cgi-script ....

If there isn't already, add it; add `.plx' to the list of extensions.

You will also need to make sure you have the necessary modules loaded to run CGI scripts; mod_cgi and mod_mime should be sufficient. Your `httpd.conf' should have the relevant LoadModule lines; uncomment them if neccessary.

Next, you need to put a copy of `regress.plx' in the directory that you plan to serve the html views from.

You will also need to install the Perl module GD, available from CPAN or via apt-get install libgd-perl on Debian.

Finally, run `regression/regress.pl' to create the xml data used to generate the html views; then, copy the `html/' directory to the same directory as `regress.plx' resides in.

At this point, you should have a working copy of the html regression views.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Martin Godisch on August, 14 2004 using texi2html