erlware_commons/doc/property_based_testing.md
Eric Merritt ff3557d883 add suport and documentation for signatures for dictionaries
Signed-off-by: Jordan Wilberding <jwilberding@gmail.com>
2011-10-16 22:26:43 -04:00

14 KiB
Raw Blame History

Property based testing for unit testers

Main contributors: Torben Hoffmann, Raghav Karol, Eric Merritt

The purpose of the short document is to help people who are familiar with unit testing understand how property based testing (PBT) differs, but also where the thinking is the same.

This document focusses on the PBT tool PropEr for Erlang since that is what I am familiar with, but the general principles applies to all PBT tools regardless of which language they are written in.

The approach taken here is that we hear from people who are used to working with unit testing regarding how they think when designing their tests and how a concrete test might look.

These descriptions are then "converted" into the way it works with PBT, with a clear focus on what stays the same and what is different.

Testing philosophies

A quote from Martin Logan:

For me unit testing is about contracts. I think about the same things I think about when I write statements like {ok, Resp} = Mod:Func(Args). Unit testing and writing specs are very close for me. Hypothetically speaking lets say a function should return return {ok, string()} | {error, term()} for all given input parameters then my unit tests should be able to show that for a representative set of input parameters that those contracts are honored. The art comes in thinking about what that set is.

The trap in writing all your own tests can often be that we think about the set in terms of what we coded for and not what may indeed be asked of our function. As the code is tried in further exploratory testing and in production new input parameter sets for which the given function does not meet the stated contract are discovered and added to the test case once a fix has been put into place.

This is a very good description of what the ground rules for unit testing are:

  • Checking that contracts are obeyed.
  • Creating a representative set of input parameters.

The former is very much part of PBT - each property you write will check a contract, so that thinking is the same.

xUnit vs PBT

Unit testing has become popular for software testing with the advent of xUnit tools like jUnit for Java. xUnit like tools typically provide a testing framework with the following functionality

  • test fixture setup
  • test case execution
  • test fixture teardown
  • test suite management
  • test status reporting and management

While xUnit tools provide a lot of functionality to execute and manage test cases and suites, reporting results there is no focus on test case execution step, while this is the main focus area of property-based testing (PBT).

Consider the following function specification

:::erlang
sort(list::integer()) ---> list::integer() | error

A verbal specification of this function is,

For all input lists of integers, the sort function returns a sorted list of integers.

For any other kind of argument the function returns the atom error.

The specification above may be a requirement of how the function should behave or even how the function does behave. This distinction is important; the former is the requirement for the function, the latter is the actual API. Both should be the same and that is what our testing should confirm. Test cases for this function might look like

:::erlang
assertEqual(sort([5,4,3,2,1]), [1,2,3,4,5])
assertEqual(sort([1,2,3,4,5]), [1,2,3,4,5])
assertEqual(sort([]         ), []         )
assertEqual(sort([-1,0, 1]  ), [-1, 0, 1] )

How many tests cases should we write to be convinced that the actual behaviour of the function is the same as its specification? Clearly, it is impossible to write tests cases for all possible input values, here all lists of integers, the art of testing is finding individual input values that are representative of a large part of the input space. We hope that the test cases are exhaustive to cover the specification. xUnit tools offer no support for this and this is where PBT and PBT Tools like PropEr and QuickCheck come in.

PBT introduces testing with a large set of random input values and verifying that the specification holds for each input value selected. Functions used to generate input values, generators, are specified using rules and can be simply composed together to construct complicated values. So, a property based test for the function above may look like:

:::erlang
FOREACH({I, J, InputList},  {nat(), nat(), integer_list()},
    SUCHTHAT(I < J andalso J < length(InputList),
    SortedList = sort(InputList)
    length(SortedList) == length(InputList)
    andalso
    lists:get(SortedList, I) =< lists:get(SortedList, J))

The property above works as follows

  • Generate a random list of integers InputList and two natural numbers I, J, such that I < J < size of InputList
  • Check that size of sorted and input lists is the same.
  • Check that element with smaller index I is less than or equal to element with larger index J in SortedList.

Notice in the property above, we specify property. Verification of the property based on random input values will be done by the property based tool, therefore we can generated a large number of tests cases with random input values and have a higher level of confidence that the function when using unit tests alone.

But it does not stop at generation of input parameters. If you have more complex tests where you have to generate a series of events and keep track of some state then your PBT tool will generate random sequences of events which corresponds to legal sequences of events and test that your system behaves correctly for all sequences.

So when you have written a property with associated generators you have in fact created something that can create numerous test cases - you just have to tell your PBT tool how many test cases you want to check the property on.

Shrinking the bar

At this point you might still have the feeling that introducing the notion of some sort of generators to your unit testing tool of choice would bring you on par with PBT tools, but wait there is more to come.

When a PBT tool creates a test case that fails there is real chance that it has created a long test case or some big input parameters - trying to debug that is very much like receiving a humongous log from a system in the field and try to figure out what cause the system to fail.

Enter shrinking...

When a test case fails the PBT tool will try to shrink the failing test case down to the essentials by stripping out input elements or events that does not cause the failure. In most cases this results in a very short counterexample that clearly states which events and inputs are required to break a property.

As we go through some concrete examples later the effects of shrinking will be shown.

Shrinking makes it a lot easier to debug problems and is as key to the strength of PBT as the generators.

Converting a unit test

We will now take a look at one possible way of translating a unit test into a PBT setting.

The example comes from Eric Merritt and is about the add/2 function in the ec_dictionary instance ec_gb_trees.

The add function has the following spec:

:::erlang
-spec add(ec_dictionary:key(), ec_dictionary:value(), Object::dictionary()) ->
          dictionary().

and it is supposed to do the obvious: add the key and value pair to the dictionary and return a new dictionary.

Eric states his basic expectations as follows:

  1. I can put arbitrary terms into the dictionary as keys
  2. I can put arbitrary terms into the dictionary as values
  3. When I put a value in the dictionary by a key, I can retrieve that same value
  4. When I put a different value in the dictionary by key it does not change other key value pairs.
  5. When I update a value the new value in available by the new key
  6. When a value does not exist a not found exception is created

The first two expectations regarding being able to use arbritrary terms as keys and values is a job for generators.

The latter four are prime candidates for properties and we will create one for each of them.

Generators

:::erlang
key() -> any().

value() -> any().

For PropEr this approach has the drawback that creation and shrinking becomes rather time consuming, so it might be better to narrow to something like this:

:::erlang
key() -> union([integer(),atom()]).

value() -> union([integer(),atom(),binary(),boolean(),string()]).

What is best depends on the situation and intended usage.

Now, being able to generate keys and values is not enough. You also have to tell PropEr how to create a dictionary and in this case we will use a symbolic generator (detail to be explained later).

:::erlang
sym_dict() ->
    ?SIZED(N,sym_dict(N)).

sym_dict(0) ->
    {'$call',ec_dictionary,new,[ec_gb_trees]};
sym_dict(N) ->
    ?LAZY(
       frequency([
                  {1, {'$call',ec_dictionary,remove,[key(),sym_dict(N-1)]}},
                  {2, {'$call',ec_dictionary,add,[value(),value(),sym_dict(N-1)]}}
                 ])).

sym_dict/0 uses the ?SIZED macro to control the size of the generated dictionary. PropEr will start out with small numbers and gradually raise it.

sym_dict/1 is building a dictionary by randomly adding key/value pairs and removing keys. Eventually the base case is reached which will create an empty dictionary.

The ?LAZY macro is used to defer the calculation of the sym_dict(N-1) until they are needed and frequency/1 is used to ensure that twice as many adds compared to removes are done. This should give rather more interesting dictionaries in the long run, if not one can alter the frequencies accondingly.

But does it really work?

That is a good question and one that should always be asked when looking at genetors. Fortunately there is a way to see what a generator produces provided that the generator functions are exported.

Hint: in most cases it will not hurt to throw in a -compile(export_all). in the module used to specify the properties. And here we actually have a sub-hint: specify the properties in a separate file to avoid peeking inside the implementation! Base the test on the published API as this is what the users of the code will be restricted to.

When the test module has been loaded you can test the generators by starting up an Erlang shell (this example uses the erlware_commons code so get yourself a clone to play with):

:::sh
$ erl -pz ebin -pz test
1> proper_gen:pick(ec_dictionary_proper:key()).
{ok,4}
2> proper_gen:pick(ec_dictionary_proper:key()).
{ok,35}
3> proper_gen:pick(ec_dictionary_proper:key()).
{ok,-5}
4> proper_gen:pick(ec_dictionary_proper:key()).
{ok,48}
5> proper_gen:pick(ec_dictionary_proper:key()).
{ok,'\036\207_là´?\nc'}
6> proper_gen:pick(ec_dictionary_proper:value()).
{ok,2}
7> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-14}
8> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-3}
9> proper_gen:pick(ec_dictionary_proper:value()).
{ok,27}
10> proper_gen:pick(ec_dictionary_proper:value()).
{ok,-8}
11> proper_gen:pick(ec_dictionary_proper:value()).
{ok,[472765,17121]}
12> proper_gen:pick(ec_dictionary_proper:value()).
{ok,true}
13> proper_gen:pick(ec_dictionary_proper:value()).
{ok,<<>>}
14> proper_gen:pick(ec_dictionary_proper:value()).
{ok,<<89,69,18,148,32,42,238,101>>}
15> proper_gen:pick(ec_dictionary_proper:sym_dict()).
{ok,{'$call',ec_dictionary,add,
        [[114776,1053475],
         'fª\020\227\215',
         {'$call',ec_dictionary,add,
             ['',true,
              {'$call',ec_dictionary,add,
                  ['2^Ø¡',
                   [900408,886056],
                   {'$call',ec_dictionary,add,[[48618|...],<<...>>|...]}]}]}]}}
16> proper_gen:pick(ec_dictionary_proper:sym_dict()).
{ok,{'$call',ec_dictionary,add,
        [10,'a¯\214\031fõC',
         {'$call',ec_dictionary,add,
             [false,-1,
              {'$call',ec_dictionary,remove,
                  ['d·ÉV÷[',
                   {'$call',ec_dictionary,remove,[12,{'$call',...}]}]}]}]}}

That does not look too bad, so we will continue with that for now.

Properties of add/2

The first expectation Eric had about how the dictionary works was that if a key had been stored it could be retrieved.

One way of expressing this could be with this property:

:::erlang
prop_get_after_add_returns_correct_value() ->
    ?FORALL({Dict,K,V}, {sym_dict(),key(),value()},
         begin
             try ec_dictionary:get(K,ec_dictionary:add(K,V,Dict)) of
                    V ->
                        true;
                    _ ->
                        false
             catch
                   _:_ ->
                       false
             end
          end).

This property reads that for all dictionaries get/2 using a key from a key/value pair just inserted using the add/3 function will return that value. If that is not the case the property will evaluate to false.

Running the property is done using proper:quickcheck/1:

:::sh
proper:quickcheck(ec_dictionary_proper:prop_get_after_add_returns_correct_value()).
....................................................................................................
OK: Passed 100 test(s).
true

This was as expected, but at this point we will take a little detour and introduce a mistake in the ec_gb_trees implementation and see how that works.