Don't get me wrong, I love unit testing. The practice of unit testing is probably the most important quality innovation in my whole career. Unit testing has spread beyond the agile development community where it started into the mainstream, and we are all better off for it.
We need to be aware of some serious limitations, though. For example, in ï»¿"Out of the Tar Pit", Moseley and Marks say:
The key problem with testing is that a test (of any kind) that uses one particular set of inputs tells you nothing at all about the behaviour of the system or component when it is given a different set of inputs. The huge number of different possible inputs usually rules out the possibility of testing them all, hence the unavoidable concern with testing will always be, "have you performed the right tests?" The only certain answer you will ever get to this question is an answer in the negative — when the system breaks.
We can only write unit tests with a certain number of input cases. Too few, and you miss an important edge case. Too many, and the cost of maintaining the tests themselves becomes onerous.
Worse yet, we know that unit tests are inadequate when we need to test overall system properties, in the presence of GUIs, and when concurrency is involved.
So here are four testing strategies that each supplement unit tests with more ways to gain confidence in your fully assembled system.
Automated Contract Testing uses a data-oriented specification of a service to help with two key tasks:
* Exercise the service and verify that it adheres to its invariants.
It looks like this:
Some things to consider when using this type of testing:
A nice library to help with this style of test is Janus.
# Property-based Testing
Property-based testing is a derivative of the "formal specifications" ideas. It uses a model of the system to describe the allowed inputs, outputs, and state transitions. Then it randomly (but repeatably) generates a vast number of test cases to exercise the system. Instead of looking for success, property-based testing looks for failures. It detects states and values that could not have been produced according to the laws of the model, and flags those cases as failures.
Property-based testing looks like this:
The canonical property-based testing tool is Quickcheck. Many partial and fragmentary open-source tools claim to be "quickcheck clones" but they lack two really important parts: search-space optimization and failure minimization. Search-space optimization uses features of the model to probe the "most important" cases rather than using a sheer brute-force approach. Failure minimization is an important technique for making the failures useful. Upon finding a failing case, minimization kicks in and searches for the simplest case that recreates the failure. Without it, understanding a failure case is just about as hard as debugging an end-user defect report.
Considerations when using property-based testing:
sometimes a problem in the model. Most business systems are not very well specified (in the rigorous CS sense of the term) and suffer from many edge cases. Even formal standards from international committees can be riddled with ambiguities and errors.
Fault Injection is pretty much what it sounds like. You run the system under test in a controlled environment, then force "bad things" to happen. These days, "bad things" mostly means network problems and hacking attacks. I'll focus on the network problems for now.
One particular fault injection tool that has some interesting results lately is Jepsen. Jepsen's author, Kyle Kingsbury, has been able to demonstrate data loss in all of the current crop of eventually consistent NoSQL databases. You can clone that repo to duplicate his results.
It looks like this:
Jepsen itself runs a bunch of VMs, then generates load. While the load is running against the system, Jepsen can introduce partitions and delays into the virtual network interfaces. By introducing controlled faults and delays in the network, Jepsen lets us try out conditions that can happen "in the wild" and see how the system behaves.
After running the test scenario, we use a validator to detect incorrect results.
Simulation testing is the most repeatable of these methods. In simulation testing, we use a traffic model to generate a large volume of plausible "actions" for the system. Instead of just running those actions, though, we store them in a database.
The activity model is typically a small number of parameters to describe things like distribution of user types, ratio of logged-in to not-logged-in users, likelihood of new registrations, and so on. We use these parameters to create a database of actions to be executed later.
The event stream database will be reused for many different test runs, so we want to keep track of which version of the model and event generator were used to create it. This will be a recurring pattern with simulation testing: we always know the provenance of the data.
The simulation runner then executes the actions against the system under test. The system under test must be initialized with a known, versioned data set. (We'll also record the version of starting dataset that was used.) Because this runner is a separate program, we can turn a dial to control how fast the simulation runs. We can go from real time, to double speed, to one-tenth speed.
Where most test methods would verify the system output immediately, simulation testing actually just captures everything in yet another database. This database of outputs includes the final dataset when the event stream was completed, plus all the outputs generated by the system during the simulation. (Bear in mind that this "database" could just be log files.) These are the normal outputs of the system, just captured permanently.
Like everything else here, the output database is versioned. All the parameters and versions are recorded in the test record. That way we can tie a test report to exactly the inputs that created it.
Speaking of the test report, we run validations against the resulting dataset as a final process. Validations may be boolean checks for correctness (is the resulting value what was expected?) but they may also verify global properties. Did the system process all inputs within the allowed time? Were all the inputs accepted and acted on? Does "money in" balance with "money out"? These global properties cannot be validated during the simulation run.
An interesting feature of this model is that validations don't all need to exist when you run the simulation. We can think of new things to check long after the test run. For example, on finding a bug, we can create an probe for that bug and then find out how far back it goes.
Simulant is a tool for building simulation tests.
* This approach has some similarities to property-based testing. There are two major differences: