364 lines
14 KiB
Markdown
364 lines
14 KiB
Markdown
![]() |
Property based testing for unit testers
|
|||
|
=======================================
|
|||
|
|
|||
|
Main contributors: Torben Hoffmann, Raghav Karol, Eric Merritt
|
|||
|
|
|||
|
The purpose of the short document is to help people who are familiar
|
|||
|
with unit testing understand how property based testing (PBT) differs,
|
|||
|
but also where the thinking is the same.
|
|||
|
|
|||
|
This document focusses on the PBT tool
|
|||
|
[`PropEr`](https://github.com/manopapad/proper) for Erlang since that is
|
|||
|
what I am familiar with, but the general principles applies to all PBT
|
|||
|
tools regardless of which language they are written in.
|
|||
|
|
|||
|
The approach taken here is that we hear from people who are used to
|
|||
|
working with unit testing regarding how they think when designing
|
|||
|
their tests and how a concrete test might look.
|
|||
|
|
|||
|
These descriptions are then "converted" into the way it works with
|
|||
|
PBT, with a clear focus on what stays the same and what is different.
|
|||
|
|
|||
|
## Testing philosophies
|
|||
|
|
|||
|
### A quote from Martin Logan:
|
|||
|
|
|||
|
> For me unit testing is about contracts. I think about the same things
|
|||
|
> I think about when I write statements like {ok, Resp} =
|
|||
|
> Mod:Func(Args). Unit testing and writing specs are very close for me.
|
|||
|
> Hypothetically speaking lets say a function should return return {ok,
|
|||
|
> string()} | {error, term()} for all given input parameters then my
|
|||
|
> unit tests should be able to show that for a representative set of
|
|||
|
> input parameters that those contracts are honored. The art comes in
|
|||
|
> thinking about what that set is.
|
|||
|
|
|||
|
|
|||
|
The trap in writing all your own tests can often be that we think
|
|||
|
about the set in terms of what we coded for and not what may indeed be
|
|||
|
asked of our function. As the code is tried in further exploratory
|
|||
|
testing and in production new input parameter sets for which the given
|
|||
|
function does not meet the stated contract are discovered and added to
|
|||
|
the test case once a fix has been put into place.
|
|||
|
|
|||
|
This is a very good description of what the ground rules for unit
|
|||
|
testing are:
|
|||
|
|
|||
|
* Checking that contracts are obeyed.
|
|||
|
* Creating a representative set of input parameters.
|
|||
|
|
|||
|
The former is very much part of PBT - each property you write will
|
|||
|
check a contract, so that thinking is the same.
|
|||
|
|
|||
|
## xUnit vs PBT
|
|||
|
|
|||
|
Unit testing has become popular for software testing with the advent
|
|||
|
of xUnit tools like jUnit for Java. xUnit like tools typically
|
|||
|
provide a testing framework with the following functionality
|
|||
|
|
|||
|
* test fixture setup
|
|||
|
* test case execution
|
|||
|
* test fixture teardown
|
|||
|
* test suite management
|
|||
|
* test status reporting and management
|
|||
|
|
|||
|
While xUnit tools provide a lot of functionality to execute and manage
|
|||
|
test cases and suites, reporting results there is no focus on test
|
|||
|
case execution step, while this is the main focus area of
|
|||
|
property-based testing (PBT).
|
|||
|
|
|||
|
Consider the following function specification
|
|||
|
|
|||
|
:::erlang
|
|||
|
sort(list::integer()) ---> list::integer() | error
|
|||
|
|
|||
|
A verbal specification of this function is,
|
|||
|
|
|||
|
> For all input lists of integers, the sort function returns a sorted
|
|||
|
> list of integers.
|
|||
|
|
|||
|
For any other kind of argument the function returns the atom error.
|
|||
|
|
|||
|
The specification above may be a requirement of how the function
|
|||
|
should behave or even how the function does behave. This distinction
|
|||
|
is important; the former is the requirement for the function, the
|
|||
|
latter is the actual API. Both should be the same and that is what our
|
|||
|
testing should confirm. Test cases for this function might look like
|
|||
|
|
|||
|
:::erlang
|
|||
|
assertEqual(sort([5,4,3,2,1]), [1,2,3,4,5])
|
|||
|
assertEqual(sort([1,2,3,4,5]), [1,2,3,4,5])
|
|||
|
assertEqual(sort([] ), [] )
|
|||
|
assertEqual(sort([-1,0, 1] ), [-1, 0, 1] )
|
|||
|
|
|||
|
How many tests cases should we write to be convinced that the actual
|
|||
|
behaviour of the function is the same as its specification? Clearly,
|
|||
|
it is impossible to write tests cases for all possible input values,
|
|||
|
here all lists of integers, the art of testing is finding individual
|
|||
|
input values that are representative of a large part of the input
|
|||
|
space. We hope that the test cases are exhaustive to cover the
|
|||
|
specification. xUnit tools offer no support for this and this is where
|
|||
|
PBT and PBT Tools like `PropEr` and `QuickCheck` come in.
|
|||
|
|
|||
|
PBT introduces testing with a large set of random input values and
|
|||
|
verifying that the specification holds for each input value
|
|||
|
selected. Functions used to generate input values, generators, are
|
|||
|
specified using rules and can be simply composed together to construct
|
|||
|
complicated values. So, a property based test for the function above
|
|||
|
may look like:
|
|||
|
|
|||
|
:::erlang
|
|||
|
FOREACH({I, J, InputList}, {nat(), nat(), integer_list()},
|
|||
|
SUCHTHAT(I < J andalso J < length(InputList),
|
|||
|
SortedList = sort(InputList)
|
|||
|
length(SortedList) == length(InputList)
|
|||
|
andalso
|
|||
|
lists:get(SortedList, I) =< lists:get(SortedList, J))
|
|||
|
|
|||
|
|
|||
|
The property above works as follows
|
|||
|
|
|||
|
* Generate a random list of integers `InputList` and two natural numbers
|
|||
|
I, J, such that I < J < size of `InputList`
|
|||
|
* Check that size of sorted and input lists is the same.
|
|||
|
* Check that element with smaller index I is less than or equal to
|
|||
|
element with larger index J in `SortedList`.
|
|||
|
|
|||
|
Notice in the property above, we *specify* property. Verification of
|
|||
|
the property based on random input values will be done by the property
|
|||
|
based tool, therefore we can generated a large number of tests cases
|
|||
|
with random input values and have a higher level of confidence that
|
|||
|
the function when using unit tests alone.
|
|||
|
|
|||
|
But it does not stop at generation of input parameters. If you have
|
|||
|
more complex tests where you have to generate a series of events and
|
|||
|
keep track of some state then your PBT tool will generate random
|
|||
|
sequences of events which corresponds to legal sequences of events and
|
|||
|
test that your system behaves correctly for all sequences.
|
|||
|
|
|||
|
So when you have written a property with associated generators you
|
|||
|
have in fact created something that can create numerous test cases -
|
|||
|
you just have to tell your PBT tool how many test cases you want to
|
|||
|
check the property on.
|
|||
|
|
|||
|
## Shrinking the bar
|
|||
|
|
|||
|
At this point you might still have the feeling that introducing the
|
|||
|
notion of some sort of generators to your unit testing tool of choice
|
|||
|
would bring you on par with PBT tools, but wait there is more to
|
|||
|
come.
|
|||
|
|
|||
|
When a PBT tool creates a test case that fails there is real chance
|
|||
|
that it has created a long test case or some big input parameters -
|
|||
|
trying to debug that is very much like receiving a humongous log from
|
|||
|
a system in the field and try to figure out what cause the system to
|
|||
|
fail.
|
|||
|
|
|||
|
Enter shrinking...
|
|||
|
|
|||
|
When a test case fails the PBT tool will try to shrink the failing
|
|||
|
test case down to the essentials by stripping out input elements or
|
|||
|
events that does not cause the failure. In most cases this results in
|
|||
|
a very short counterexample that clearly states which events and
|
|||
|
inputs are required to break a property.
|
|||
|
|
|||
|
As we go through some concrete examples later the effects of shrinking
|
|||
|
will be shown.
|
|||
|
|
|||
|
Shrinking makes it a lot easier to debug problems and is as key to the
|
|||
|
strength of PBT as the generators.
|
|||
|
|
|||
|
## Converting a unit test
|
|||
|
|
|||
|
We will now take a look at one possible way of translating a unit
|
|||
|
test into a PBT setting.
|
|||
|
|
|||
|
The example comes from Eric Merritt and is about the `add/2` function in
|
|||
|
the `ec_dictionary` instance `ec_gb_trees`.
|
|||
|
|
|||
|
The add function has the following spec:
|
|||
|
|
|||
|
:::erlang
|
|||
|
-spec add(ec_dictionary:key(), ec_dictionary:value(), Object::dictionary()) ->
|
|||
|
dictionary().
|
|||
|
|
|||
|
and it is supposed to do the obvious: add the key and value pair to
|
|||
|
the dictionary and return a new dictionary.
|
|||
|
|
|||
|
Eric states his basic expectations as follows:
|
|||
|
|
|||
|
1. I can put arbitrary terms into the dictionary as keys
|
|||
|
2. I can put arbitrary terms into the dictionary as values
|
|||
|
3. When I put a value in the dictionary by a key, I can retrieve that same value
|
|||
|
4. When I put a different value in the dictionary by key it does not change other key value pairs.
|
|||
|
5. When I update a value the new value in available by the new key
|
|||
|
6. When a value does not exist a not found exception is created
|
|||
|
|
|||
|
The first two expectations regarding being able to use arbritrary
|
|||
|
terms as keys and values is a job for generators.
|
|||
|
|
|||
|
The latter four are prime candidates for properties and we will create
|
|||
|
one for each of them.
|
|||
|
|
|||
|
### Generators
|
|||
|
|
|||
|
:::erlang
|
|||
|
key() -> any().
|
|||
|
|
|||
|
value() -> any().
|
|||
|
|
|||
|
|
|||
|
For `PropEr` this approach has the drawback that creation and shrinking
|
|||
|
becomes rather time consuming, so it might be better to narrow to
|
|||
|
something like this:
|
|||
|
|
|||
|
:::erlang
|
|||
|
key() -> union([integer(),atom()]).
|
|||
|
|
|||
|
value() -> union([integer(),atom(),binary(),boolean(),string()]).
|
|||
|
|
|||
|
What is best depends on the situation and intended usage.
|
|||
|
|
|||
|
Now, being able to generate keys and values is not enough. You also
|
|||
|
have to tell `PropEr` how to create a dictionary and in this case we
|
|||
|
will use a symbolic generator (detail to be explained later).
|
|||
|
|
|||
|
:::erlang
|
|||
|
sym_dict() ->
|
|||
|
?SIZED(N,sym_dict(N)).
|
|||
|
|
|||
|
sym_dict(0) ->
|
|||
|
{'$call',ec_dictionary,new,[ec_gb_trees]};
|
|||
|
sym_dict(N) ->
|
|||
|
?LAZY(
|
|||
|
frequency([
|
|||
|
{1, {'$call',ec_dictionary,remove,[key(),sym_dict(N-1)]}},
|
|||
|
{2, {'$call',ec_dictionary,add,[value(),value(),sym_dict(N-1)]}}
|
|||
|
])).
|
|||
|
|
|||
|
|
|||
|
`sym_dict/0` uses the `?SIZED` macro to control the size of the
|
|||
|
generated dictionary. `PropEr` will start out with small numbers and
|
|||
|
gradually raise it.
|
|||
|
|
|||
|
`sym_dict/1` is building a dictionary by randomly adding key/value
|
|||
|
pairs and removing keys. Eventually the base case is reached which
|
|||
|
will create an empty dictionary.
|
|||
|
|
|||
|
The `?LAZY` macro is used to defer the calculation of the
|
|||
|
`sym_dict(N-1)` until they are needed and `frequency/1` is used
|
|||
|
to ensure that twice as many adds compared to removes are done. This
|
|||
|
should give rather more interesting dictionaries in the long run, if
|
|||
|
not one can alter the frequencies accondingly.
|
|||
|
|
|||
|
But does it really work?
|
|||
|
|
|||
|
That is a good question and one that should always be asked when
|
|||
|
looking at genetors. Fortunately there is a way to see what a
|
|||
|
generator produces provided that the generator functions are exported.
|
|||
|
|
|||
|
Hint: in most cases it will not hurt to throw in a
|
|||
|
`-compile(export_all).` in the module used to specify the
|
|||
|
properties. And here we actually have a sub-hint: specify the
|
|||
|
properties in a separate file to avoid peeking inside the
|
|||
|
implementation! Base the test on the published API as this is what the
|
|||
|
users of the code will be restricted to.
|
|||
|
|
|||
|
When the test module has been loaded you can test the generators by
|
|||
|
starting up an Erlang shell (this example uses the erlware_commons
|
|||
|
code so get yourself a clone to play with):
|
|||
|
|
|||
|
:::sh
|
|||
|
$ erl -pz ebin -pz test
|
|||
|
1> proper_gen:pick(ec_dictionary_proper:key()).
|
|||
|
{ok,4}
|
|||
|
2> proper_gen:pick(ec_dictionary_proper:key()).
|
|||
|
{ok,35}
|
|||
|
3> proper_gen:pick(ec_dictionary_proper:key()).
|
|||
|
{ok,-5}
|
|||
|
4> proper_gen:pick(ec_dictionary_proper:key()).
|
|||
|
{ok,48}
|
|||
|
5> proper_gen:pick(ec_dictionary_proper:key()).
|
|||
|
{ok,'\036\207_là´?\nc'}
|
|||
|
6> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,2}
|
|||
|
7> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,-14}
|
|||
|
8> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,-3}
|
|||
|
9> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,27}
|
|||
|
10> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,-8}
|
|||
|
11> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,[472765,17121]}
|
|||
|
12> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,true}
|
|||
|
13> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,<<>>}
|
|||
|
14> proper_gen:pick(ec_dictionary_proper:value()).
|
|||
|
{ok,<<89,69,18,148,32,42,238,101>>}
|
|||
|
15> proper_gen:pick(ec_dictionary_proper:sym_dict()).
|
|||
|
{ok,{'$call',ec_dictionary,add,
|
|||
|
[[114776,1053475],
|
|||
|
'fª\020\227\215',
|
|||
|
{'$call',ec_dictionary,add,
|
|||
|
['',true,
|
|||
|
{'$call',ec_dictionary,add,
|
|||
|
['2^Ø¡',
|
|||
|
[900408,886056],
|
|||
|
{'$call',ec_dictionary,add,[[48618|...],<<...>>|...]}]}]}]}}
|
|||
|
16> proper_gen:pick(ec_dictionary_proper:sym_dict()).
|
|||
|
{ok,{'$call',ec_dictionary,add,
|
|||
|
[10,'a¯\214\031fõC',
|
|||
|
{'$call',ec_dictionary,add,
|
|||
|
[false,-1,
|
|||
|
{'$call',ec_dictionary,remove,
|
|||
|
['d·ÉV÷[',
|
|||
|
{'$call',ec_dictionary,remove,[12,{'$call',...}]}]}]}]}}
|
|||
|
|
|||
|
That does not look too bad, so we will continue with that for now.
|
|||
|
|
|||
|
|
|||
|
### Properties of `add/2`
|
|||
|
|
|||
|
The first expectation Eric had about how the dictionary works was that
|
|||
|
if a key had been stored it could be retrieved.
|
|||
|
|
|||
|
One way of expressing this could be with this property:
|
|||
|
|
|||
|
:::erlang
|
|||
|
prop_get_after_add_returns_correct_value() ->
|
|||
|
?FORALL({Dict,K,V}, {sym_dict(),key(),value()},
|
|||
|
begin
|
|||
|
try ec_dictionary:get(K,ec_dictionary:add(K,V,Dict)) of
|
|||
|
V ->
|
|||
|
true;
|
|||
|
_ ->
|
|||
|
false
|
|||
|
catch
|
|||
|
_:_ ->
|
|||
|
false
|
|||
|
end
|
|||
|
end).
|
|||
|
|
|||
|
This property reads that for all dictionaries `get/2` using a key
|
|||
|
from a key/value pair just inserted using the `add/3` function
|
|||
|
will return that value. If that is not the case the property will
|
|||
|
evaluate to false.
|
|||
|
|
|||
|
Running the property is done using `proper:quickcheck/1`:
|
|||
|
|
|||
|
:::sh
|
|||
|
proper:quickcheck(ec_dictionary_proper:prop_get_after_add_returns_correct_value()).
|
|||
|
....................................................................................................
|
|||
|
OK: Passed 100 test(s).
|
|||
|
true
|
|||
|
|
|||
|
|
|||
|
This was as expected, but at this point we will take a little detour
|
|||
|
and introduce a mistake in the `ec_gb_trees` implementation and see
|
|||
|
how that works.
|
|||
|
|
|||
|
|
|||
|
|