Details of the Experiments

Results for the second technique performed experiments

The second technique repairs faulty situations by finding a correct output o for the given input i, employing the specification of the failing routine r.

This time, we considered the following case studies: a singly linked and a doubly linked implementation of the API of java.util.List (Singly Linked Lists and Abstract Linked Lists, respectively); a binary search tree (BinarySearch), a red-black tree (TreeSet), and an AVL tree (AVLTree) implementation of java.util.Set; and a red-black tree based implementation of java.util.Maps (TreeMap).

The table below summarizes the results of the experiments for the second technique.

As before, the results for different case studies are reported in different tabs. For each case study, we report the same information as before, with the exception of the average workaround length which makes no sense for this technique.

Our evaluation consists of an experimental assessment of the effectiveness of the two presented techniques for automatically computing workarounds, and repairing faulty states, respectively. The evaluation is based on a benchmark of collection implementations, accompanied by their corresponding JML contracts including requires/ensures clauses, loop variant functions and class invariants. These are the following:

•Two implementations of interface java.util.List, one based on singly linked lists, taken from [Galeotti et al.], the other a circular double linked list taken from AbstractLinkedList in Apache Commons.Collections.
•Three alternative implementations of java.util.Set, one based on binary search trees taken from [Visser et al.], another based on AVL trees taken from [Belt et al.], and the red-black trees implementation TreeSet from java.util.
•One implementation of java.util.Map, based on red-black trees, taken from class TreeMap in java.util.

In the following, a faulty situation refers to an execution of a routine r on an input i that produces a violation of r's specification. Furthermore, we say that Wagfinder repairs a faulty situation when it finds a workaround that can be used in place of r, since it produces correct results when executed on i.

In order to assess our workaround techniques we artificially built "faulty situations" (i.e., a set of inputs for the routines) for which we assume the routines to fail.

These inputs were randomly and automatically constructed using Randoop (see the article for details). As a result of the generation we obtained 116 lists, 136 sets and 138 maps to evaluate Wagfinder's workaround finding ability.

All the experiments were run on a commodity PC with Intel(R) Core(TM) i5-4460 CPU, running at 3.40Ghz and holding 8GB of RAM. We used GNU/Linux 3.2.0 as the OS. The workaround repair prototypes have a part implemented as Eclipse Java projects, run using OpenJDK 1.7 as the underlying Java platform, and another part in Python 2.7.

A timeout of 10 minutes was set for each individual experiment (i.e. an attempt to repair a single faulty situation): executions of Wagfinder exceeding the timeout were interrupted, and the faulty situation was deemed as not fixed.

We use H:MM:SS.mmm format for reporting time spent in any experiment

Results for the first technique performed experiments

Repairing a faulty situation --a failure produced by a routine r on input i-- using the first technique means finding a combination of routines (of the same module) that, when executed on input i, behave as r's specification dictates.

For the experiments we considered three Java modules: java.util.List (Lists), java.util.Set (Sets) and java.util.Map (Maps).

To measure Wagfinder's workaround finding ability, we ran the tool for all the available methods in all the automatically generated faulty situations. The results are reported in the table below.

There is one tab for each case study, where we report in each row:

•Method to fix: the name of the considered routine.
•Total time: the sum of the times spent by Wagfinder in repairing all the faulty situations involving the routine (excluding faulty situations not repaired within the timeout).
•Avg. time: the total time divided by the number of repaired faulty situations.
•Avg. wa. (average workaround length): the number of routines that the workaround found involves, in average (considering only repaired faulty situations).
•#WA found (workarounds found): the number of faulty situations repaired.
•#TO (timeouts): the number of faulty situations that could not be fixed in 10 minutes.
•Min. repair (time): the longest time spent in repairing a faulty situation where the input is of the smallest size.
•Min. repair (size): the size of the smallest input in any faulty situation.
•Max. repair (time): the longest time spent in repairing a faulty situation where the input is of the largest size.
•Max. repair (size): the size of the largest input in any faulty situation.