The Science of Deep Specification

QuickChick: Specification-based testing


Property-based random testing (PBRT)—or automatic specification-based testing—is a form of black-box testing where a software artifact is subjected to a large number of random checks derived from formal statements of its intended behav- ioral properties. For example, if the artifact under test is a function f that takes a list of numbers as its input and (hopefully) returns the same list in sorted order, then a reasonable specification of f might be sorted(f(oldlist)) ∧ permutation(f(oldlist),oldlist), where sorted is a simple function that checks whether its input is ordered and permutation checks whether one of its argument lists is a permutation of the other. To test whether f satisfies this specification, we generate a large number of random input lists, apply f to each one, and check that sorted and permutation both yield true.

PBRT was popularized in the functional programming world by the QuickCheck library for Haskell, and the past decade has seen an explosion of work in the area. QuickCheck has been ported to many other platforms (at least 23, according to Wikipedia), has been applied in a wide variety of software testing situations, has formed the basis for a long list of successors and refinements, and has become an essential tool in programming courses at many universities. (It is used heavily, for example, in the Advanced Programming course at Penn.) A recently formed company, QuviQ, is using QuickCheck for testing mission- and safety-critical software systems for the telecommunications and automotive industries. In 2010, the original QuickCheck paper by Claessen and Hughes was named the most influential paper of ICFP 2000.

The executable specifications used by QuickCheck and related tools are deep: they embody rich descriptions of component behaviors (not just specific use cases, as unit tests do), they are precise and formal, and they are connected, via random testing, to concrete implementations. Indeed, we should be able to use the same deep specifications for both random testing and formal verification—random testing for low-cost bug finding early in the development process (and for components that are not cost effective to verify), and deductive verification for high assurance after the specification and implementation have both stabilized.

In this Expedition, we are working to demonstrate this vision in practice. Doing so involves addressing some fundamental challenges. In particular: (i) PBRT demands that specifications be presented in an executable form—essentially, in the form ∀X1, ... ∀Xm, ...P1 ∧ ... ∧ Pn ⇒ Q, where Pi and Q are boolean functions over the Xi. However, the most natural style for writing many specifications is a relational style. To bridge this gap, new fundamental theory and tools will be developed for extracting executable specifications from relational ones, generalizing and extending prior work on random testing in proof assistants such as Isabelle, Agda, and ACL2. (ii) Current PBRT tools are limited in their ability to test nondeterministic systems such as operating systems. To overcome this limitation, we are developing tools allowing the testing framework to control the scheduling of the component under test.

The main experimental platform for PBRT is a native version of QuickCheck implemented within Coq, dubbed QuickChick. This tool will be used in developing specifications and implementations: (i) to debug early versions of new LLVM optimization passes with respect to the deeply specified LLVM semantics by testing that a large number of randomly generated programs behave the same before and after optimization; (ii) to debug our Bluespec compiler with respect to the deeply specified semantics of the Bluespec and Verilog languages; (iii) to validate the GHC compiler by generating Haskell expressions and checking that their behavior as specified by the Haskell Core semantics matches the behavior of the LLVM IR terms emitted by the LLVM back end for GHC—including the behavior of foreign functions specified in Verifiable C and system services as described by the CertiKOS specification; and (iv) to allow rigorous validation of CertiKOS components (such as device drivers) for which full formal verification is impractical. PBRT is also integral to the educational components of this Expedition.


QuickChick repository

Recent publications

Leonidas Lampropoulos, Diane Gallois-Wong, Cătălin Hritcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. Beginner's Luck: A Language for Random Generators. In 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), January 2017. To appear. [ bib | http ]

Benjamin C. Pierce. The Science of Deep Specification, November 2016. Invited keynote at SPLASH / OOPSLA. [ bib | slides | video/a> ]