Monday, December 1, 2014

A visit to the Academy

The Berlin-Brandenburg Academy of Sciences and Humanities invited me to speak this past week at a non-public meeting about plagiarism detection software for the working group Zitat und Paraphrase (Quotation and Paraphrase). I was a bit leery of speaking there, as some of the members of the group have publicly demonstrated a quite problematic interpretation of plagiarism as far as it concerned the dissertation of one particular person (see [1] - [2] - [3] for detailed online articles in German about this particular case, and a recent German essay [4] that compares this plagiarism case with one from the early 90s).

Since I do enjoy a good discussion, I agreed to speak. Unfortunately, the meeting was not open to the public, so I am only able to repeat the points of my presentation here, not the ensuing discussion. As it turned out, there were not many members of the group there, and none of the vociferous members I had been expecting.

I first made it exceedingly clear that VroniPlag Wiki is not a machine or software of any sort, but an academic community in which I take part. After discussing Teddi Fishman's definition of plagiarism, which I would extend to include "without properly attributing the work" in point 3 and removing point 5 entirely, I gave a few examples of some of the different forms of plagiarism. These were followed by screenshots of a few plagiarism detection systems that have complicated reports or report essentially meaningless numbers.

One important point that is often overlooked when using such systems is that they all suffer from both false positives as well as false negatives: This is an inherent problem with attempting to determine plagiarism using software. Quotations are difficult to detect reliably, especially if they are only indented; literature references should of course be similar to references used in other papers; and some systems begin to mark anything longer than 6 or 7 words as text similarity. All of these can be the source of a false positive, in addition to simple programming errors, which I have also seen. The other side of the coin is the false negatives, and they are quite simple to understand: If the software does not have access to a source, it will not be able to determine that it is indeed a source. Translated text, for example, is next to impossible to identify with software, as well as non-digitized content.

I then discussed the small, general tools that can be used to manually detect and document plagiarism. After a few examples of documented plagiarism from historic cases and from current cases at VroniPlag Wiki, I closed by asking some ethical questions that I include in my forthcoming chapter on plagiarism detection software for the "Handbook of Academic Integrity":
  • Is it necessary to find all the plagiarism in a text?
  • Is it ethical for a university to use plagiarism detection software?
  • Is it ethical for a university to use plagiarism detection software as a formative device?
  • Is it ethical for a university to offer plagiarism detection software for teachers to use?
  • Is it ethical for a university to offer plagiarism detection software for researchers to use?
We had a good discussion afterwards unfortunately, there was no time to linger on and talk further over a cup of coffee. I do hope that those who were present can serve as multiplicators, explaining to their peers that there is no magic silver bullet software for finding plagiarism, just a number of useful tools, large and small, that all incur a cost of time and effort to use.

[1] Causa Schavan (n.d.) Articles about "Zitat und Paraphrase." [Blog]. Retrieved December 1, 2014, from
[2] Erbloggtes. (n.d.) Articles about "Zitat und Paraphrase." [Blog]. Retrieved December 1, 2014, from
[3] Dannemann, G. (2013, March 3). Die Ex-Ministerin und ihre Unterstützer: Schavanzentrisches Weltbild. Retrieved December 1, 2014, from
[4] Ebert, T. (2014). Sag mir, wie hältst Du es mit dem Plagiat? Von Elisabeth Ströker zu Annette Schavan. Merkur, 68(12), 1070–1080.

