On Software Testing in Research

Jan 05, 2019

Extensive software testing is not a wide-spread practice in academic research. In this short argumentative post, I’ll outline some of the negative consequences this entails. In the end, I hope to convince you that testing is not only something for industrial software development but that it is also essential for doing research using software prototypes.

From Bugs to False Results

People with an academic background often have no or little exposure to the practice of software testing. This is unfortunate, because if you want to make sure your software behaves correctly there is practically no way around comprehensive software testing.

While recent developments such as the Graphics Replicability Stamp Initiative are a step in the right direction, I’m convinced that this is not yet sufficient. Assuring the reproducibility of the results simply means reproducibility of the bugs. And I’m sure there are plenty of them out there, especially considering how a typical research prototype usually comes into existence, i.e., written in a rush to meet the next paper deadline.

Of course, there’s also the possibility you’re a rock star developer, and your software never contains a bug. But what about your collaborators? Your students? The third-party libraries you’re depending on? The industry average defect rate is around 1–25 bugs per 1000 lines of code¹. And that’s industry average and not research average. Do the math.

The consequence of those undiscovered bugs are scientific publications including false results. Nobody really needs that.

Validate Hypotheses on a Solid Basis

You might argue that testing is a waste of time when writing throw-away research prototypes. I think this is wrong. You write your prototype for obtaining results, that’s for sure. More importantly, however, the prototype is your central tool for validating—or falsifying—your hypotheses. This is at the very core of the scientific method. If you do not relentlessly test your prototype, the results you obtain can be completely misleading. Eventually, you assume your hypothesis is correct while it’s not. Or, you might discard your hypothesis because your results are unsatisfactory. And so a potentially brilliant idea goes down the drain.

Untested Prototypes Are Wasteful

The bad consequences do not stop here. Untested code is legacy code. The untested prototype will silently bit-rot either in some online repository or— even worse—just on some department server or backup disk. A few years down the line nobody will remember how to get things working or how to make changes. In other words, the code becomes practically useless. Given the fact that a large part of research is publicly funded, one could consider this a rather direct waste of public money.

Conclusion

To make a long story short: Stop using excuses. Software testing is an essential tool for making sure your research prototype produces correct results. Get familiar with it and use it extensively the next time you develop a prototype. Consider even using test-driven development (TDD) to make sure you test things right from the start.

Now I’m going back to my corner and feel ashamed of all the untested research prototypes I have written!

McConnell, Steve. Code Complete, Microsoft Press, 2004. Page 521. ↩