Statistical Structural Program Testing is an application of our works on Random-Based Model Exploration to the domain of structural (« white-box ») program testing.
Context
In classical statistical program testing, test cases are obtained by drawing input values according to some given probability distribution over the input domain. This allows for an easy generation of large test sets and a way to estimate the operational reliability of the program : if the operational input distribution is known and if one can use it as the probability distribution for the drawing, then the (expected) most frequent uses of the program will be tested most ! Its associated drawback is a weak coverage of rare cases, their associated input values being under-represented in the test set. Hence, these methods have to be complemented with additional strategies for ensuring that various parts of the program have been covered (expressed as a measure of the coverage of instructions, branch, loops or similar components of the control flow graph (CFG) or data flow graph for the program under test) or for dealing with rare cases (for instance unexpected exception raising, not directly expressible as a structural criterion).
As an alternative, "coverage-biased" statistical methods have been proposed where the drawing of test cases is designed for maximizing the probability of rare cases to be covered, although actual input values are still drawn randomly. The idea is to first draw uniformly among all the paths of a given length, something that can be done efficiently on a large class of programs, before computing input values for executing these paths. Hence, any path of the given length has a known probability to be drawn. The drawback is the possibility for the drawing to return unfeasible paths: paths in the CFG never followed by any possible execution of the program.
Structural testing and coverage criteria : In structural program testing, each criterion C defines a set of elements (instruction, branch, etc) to cover for C to be fulfilled. In the case of deterministic methods, each such element has to be exercised by at least one test. With this definition, a solution with a unique path is as good as a solution with many paths. In random testing the satisfaction of a criterion is a probabilistic notion since there is no guaranty that a set of paths drawn randomly will always cover the criterion: For a given number N of paths to be drawn, the “test quality” is defined as the minimal probability of covering all elements from the criterion when running the N paths. When the criterion is not expressed as a set of paths but of a different kind of elements, the assessment of test quality is more difficult and related to the minimal probability for an element of the criterion to be exercised by a path from the test set. Details of the method for maximizing this minimal probability can be found in [1].
Objectives
Our current work in the area of statistical structural testing of programs is to show that the uniform drawing of paths can be « smart » enough for not drawing too many unfeasible paths and can scale up reasonably well. Techniques involved come from compiler construction, semantics program annotation, (extended) static analysis, symbolic evaluation and constraint-solving. For instance the use of “program slicing” for automatically deriving “simplified control flow graphs” before drawing paths is under investigation. Another approach is to use annotations related to the “combinatorial” aspects of a computation (for instance hints expressing relations about the numbers of traversals of different loops or other parts of the program) in addition to the usual annotations about its functional behavior (assertions, pre/post contracts, etc).
We intend to implement a prototype for experiments within the Frama-C platform (http://frama-c.com) for a realistic subset of C, possibly using an adapted memory model in the lines of [6] integrated in the ACSL annotation language provided by Frama-C.
People
Frederic Voisin, Marie-Claude Gaudel, Romain Aissat (doctorant), Burkhart Wolff
Results
- The prototype Auguste, initially developed by Sandrine-Dominique Gouraud for the statistical structural testing of programs in a subset of the programming language [2].
- The Rukia-Toolbox for efficient uniform selection of paths in very large automata.[1]
Selected Publications
[1] Alain Denise, Marie-Claude Gaudel, Sandrine-Dominique Gouraud, Richard Lassaigne, Johan Oudinet, and Sylvain Peyronnet. Coverage-biased random exploration of large models and application to testing. STTT, International Journal on Software Tools for Technology Transfer, 14(1):73-93, 2012.
[2] Sandrine-Dominique Gouraud: Utilisation des structures combinatoires pour le test statistique,Thèse de doctorat en Informatique, Université Paris-Sud, 2004.
[3] R. Majundar and K. Shen : Hybrid Concolic Testing, ICSE '07 Proceedings of the 29th international conference on Software Engineering, pages 416-426 , 2007.
[4] C. Cadar, D. Dumbar, D. Engler : KLEE , Unassisted and Automatic Generation of High-Coverage Test for Complex Systems Programs. Usenix Symposium on Operating Systems Design and Implementation (OSDI), San Diego, 2008.
[5] N. Williams, Abstract path testing with PathCrawler. Proceedings of the 5th Workshop on Automation of Software Test, pages 35-42, 2010.
[6] Sascha Böhme, Michal Moskal, Wolfram Schulte, and Burkhart Wolff. HOL-Boogie - An Interactive Prover-Backend for the Verified C Compiler. Journal of Automated Resoning (JAR), 44(1-2):111-144, 2009. [ bib | DOI | .pdf ].