Title: Automatic Software Plagiarism Detection
1Automatic Software Plagiarism Detection
- By Chris Collins - 226209
2Background Research
- Reasons for plagiarism
- Saving time.
- Saving effort.
- Lack of knowledge.
- Lack of understanding of ownership.
3Background Research
- Anti-detection tricks
- Renaming of variables/functions.
- Reordering blocks of code.
- Changing comments and whitespace.
- Detecting plagiarism
- Obfuscation.
4Obfuscation
public class class1 private int
var1 private int var2 public class1(int
param1, int param2) var1 param1 var2
param2 public int f1() return var1
var2
public class myAdder private int
numberOne private int numberTwo
public myAdder(int num1, int num2)
numberOne num1 numberTwo num2
public int addition()
return numberOne numberTwo
public class adder private int
firstNumber private int secondNumber
public adder(int first, int second)
firstNumber first secondNumber
second public int add()
return firstNumber secondNumber
5Background Research
- Anti-detection tricks
- Renaming of variables/functions.
- Reordering blocks of code.
- Changing comments and whitespace.
- Detecting plagiarism
- Obfuscation.
- Removing whitespace and comments.
- Structure metric systems.
6Existing Systems
- Ottensteins system.
- Halstead metrics.
- MOSS. 1
- Whitespace insensitivity.
- Noise suppression.
- Position independence.
- JPlag. 2
- Karp Rabin string matching.
1 AIKEN, A., SCHLEIMER, S. AND WILKERSON, D.,
2003. Winnowing Local Algorithms for Document
Fingerprinting. Stanford University,
California. 2 MALPOHL, G., PHILIPPSEN, M. AND
PRECHELT, L., 2000. JPlag Finding plagiarisms
among a set of programs online. University of
Karlsruhe. Available from http//www.ubka.uni-ka
rlsruhe.de/cgi-bin/psview?document/ira/2000/1sea
rch/ira/2000/1 Accessed 16 October 2005.
7Initial Design Considerations
- The type of system.
- Implementation and input languages.
8Initial Design Considerations
9Primary aims
- Creation of an attribute counting system for
Java, in Java. - Platform independency.
- The implementation of metrics that return a
number that can be used in determining an amount
or likelihood of plagiarism. - To ensure that the system is bug-free and well
documented. - To make the program easily extensible.
10Secondary aims
- To extend the system, adding more metrics.
- To add metrics that deal with program structure.
- Investigate the possibility of having more input
languages. - The optimisation of processing and space
overheads.
11Progress So Far
- Research into plagiarism, its detection and
existing systems carried out. - Key design considerations made.
- Design and coding started for the back-end of the
prototype. - GUI designed and coded.
12What Next?
- Continuation of design and the coding of the
prototype. - Expansion of the prototype to include more tests
by experimenting with different metrics. - Documentation of the program.
- Creation of a final, tested version.
13Questions?