Title

X-code: the next mutant generation of first-class CS education
X-code: the bugs of future past
Mutation testing in computer science education
Mutation testing automatically generates programming assignments
Applying mutation testing to assignment generation
Intelligent mutation generates educational programming assignments
Generating mutant assignments
Professor-X: creating mutants for CS education
Creating mutant bugs for CS education

Abstract

Mutation testing is a method traditionally used in software engineering.
In computer science education, programming assignment generation is a laborious and time consuming part of instruction.
Debugging is a frequent activity in industry
Debugging is a skill that continues to improve beyond the bachelor’s level.
Academic integrity (cheating, copying) is a significant impediment to learning from the programming assignment experience.
Here, we define a novel application of selective intelligent mutation toward the goal of cheating-resistant, streamlined programming assignment creation.

Introduction

We bring together two largely exclusive fields: CS education and mutation testing in software engineering.

Evolutionary Computation

Broad discussion of EC methods for mutating source code.

Mutation Testing

Mutation testing is a second-order testing strategy that injects logical errors into a program to validate test adequacy.
For example, the less-than operator inside an if-else statement is changed to a greater-than-or-equal operator, then the mutated program is run against a unit test.
An adequate unit test must fail.
An inadequate test would not detect a logical mutation in the tested source code.
Thus, mutations represent errors that any competent programmer may make
Errors that change control flow have the greatest chance of propagating
Weak vs Strong Mutations
What is mutated? Why?:
- Offut [1] states that the following are effective operators:
  - Arithmetic Operators (AOR): 1+2 -> 1-2
  - Relation Operators (ROR) 1 < 2 -> 1 == 2
  - Logical Connect (LCR): 1 AND 0 -> 1 OR 0
  - Unary Operator Insertion (UOI) 1 -> NOT 1
  - Absolute Value Insertion (ABS) -1 -> abs(-1)
- Offut [1] differentiates between semantic and syntatic size:
  - syntactic size is the fewest number of changes needed to revert to original
  - semantic size depends on how the change affects the input domain
  - A mutant with semantic size 0 (does nothing to the size of the input domain) is "Equivalent"
  - A mutant with a large semantic size (killed by most, if not all inputs) is "Trivial"
- Harder assignments involve small semantic size, easy assignments, large semantic size

CS Education

Programming

While research in this field is primarily concerned with the detection of equivalent mutants, we introduce a new application for mutants:
- automatic programming assignment generation in computer science education: Starting with a correct program, random mutations are generated, and mutants are distributed to students.
It is then the job of the students to find and fix the error, and potentially to write tests to "kill" the mutants. The goal is to emulate a common scenario in software development, fixing bugs and writing unit tests.
This paper describes the methodology behind generating source mutations for the purpose of randomized debugging assignment creation in a language-independent way.

=== Debugging ===

Debugging is an important skill.

Academic Honesty and Cheating

CS education has a massive problem with academic honesty.
Disciplines like mathematics can parameterize assignments, for example by changing the numbers in a calculus problem.
On the other hand, CS implements a calculus solver, and thus has difficulty making new assignments.
To make matters worse, each new assignment has large overhead for creation.
Further, it is easier to detect cheating in CS than other disciplines, due to the unique fingerprint produced.
In addition, many solutions are readily available on the internet.
One goal of this project is to streamline the assignment creation process, while increasing difficulty of cheating.

Methods

A translation unit is the ultimate input to a C or C++ compiler from which an object file is generated.
It is generated from the source file after running through a preprocessor.
In interpreted languages like Python or Java, the translation unit is the source file.
A private function is a function that cannot be accessed outside the translation unit, as opposed to a public function.
Public functions may be explicitly tested.
A testable unit consists of code in the function, code of private functions that is called, excluding the code of called public functions
The behavior of private functions propagates to its public caller.
Thus, all private functions called by a public function are subject to mutation.

User Workflow

Write assignment
Write unit test for assignment
Ensure 100% line coverage
Generate and test mutants
Select killed mutants and distribute to students

Program Flow

program-flow

Given a translation unit...

Feed through parser, traverse syntax tree, and locate mutation candidates. Store in file. User can edit this file before step 3, which allows fine tuning of mutation parameters. The file is formatted as a CSV:
- Table 1
  
  LINE COL SYMBOL OP TYPE OPERATOR
  
  9 0 is_valid_sides BinaryOp >
  
  0 0 triangle_tritype SYMLINK is_valid_sides
- The SYMLINK operator type indicates that triangle_tritype , a public function, calls is_valid_sides, a private function

LINE	COL	SYMBOL	OP TYPE	OPERATOR
9	0	is_valid_sides	BinaryOp	>
0	0	triangle_tritype	SYMLINK	is_valid_sides

Feed the candidate file and the source file through a mutator program. The mutator program reads the candidate file and performs a mutation. The mutation is recorded and the output file is recorded.:

Table 2

LINE	COL	SYMBOL	OP TYPE	OPERATOR	NEW OP	LOCATION
9	13	is_valid_sides	BinaryOp	!=	<=	0-0-triangle.so
9	13	is_valid_sides	BinaryOp	!=	>	0-1-triangle.so
9	13	is_valid_sides	BinaryOp	!=	<	0-2-triangle.so
9	13	triangle_tritype	BinaryOp	!=	<=	0-0-triangle.so
9	13	triangle_tritype	BinaryOp	!=	>	0-1-triangle.so

Two new columns are added: NEW OP and LOCATION. The NEW OP column indicates how the symbol was mutated. Location indicates the file where the mutated symbol can be found. In our implementation for C, the new symbol is found in a shared object file.
The last two row indicates that the symbolic link from table 1 has been resolved

The DB file is read by the unit test executor. The data is stored in an associative array, keyed by the symbol. The value is a list of all mutations for that given symbol. Represented in JSON, the data would look like this:
```
    {
        “Is_valid_side”: [“0-0-triangle.so”, “0-1-triangle,so”, …]
        “triangle_tritype”: [“0-0-triangle.so”, “0-1-triangle.so”, …]
    }
```

Note that in these examples, is_valid_side is a private function. However, we expect references to private references to fail at link-time.

Run the mutation test, and record which mutants were successfully killed.

Results

When we applied these methods to varying types of small programs, we noticed several interesting results.
Diversity of mutants
Feasibility as a real assignment in a large class
Vary number of mutations, type of mutation
Potential for cheating
P2P attacks on this system, cite literature, like sybil attack, etc.
Make a plot: for a given number of mutations per code file, how many students does it take to cheat?
Use this as a group assignment, to turn it around?

Discussion

Applications:
- Create assignments to teach unit testing
Future Directions:
- Vim plugin to select mutation, see mutation result
- Implementation for ducktyped languages like python

References

[1] Offut, et al. "An Experimental Determination of Sufficient Mutation Operators"

TODO

Create graphs of workflow
Restrict operators for pointers

Contents