Monday, March 9, 2009

Umeå Study of Plagiarism Detection Software

An article in a Lund, Sweden, student newspaper I found complained that the university's choice of plagiarism detection software was bad because it only found 25% of the plagiarisms in a study done in 2006 by the University of Umeå.

After some unsucessful Googling I entered in a few words from the article and quickly found them in the abstract of two reports from 2006:
  • Anna Nordström och Susanne Sjöberg: UTVÄRDERING AV URKUND, ETT VERKTYG FÖR PLAGIATKONTROLL. Oktober 2006. Report 16 (yes, the names of the reports seem to be a sick joke of the IT department on this sociology unit)
The first is a report commissioned from CERUM by an organisational unit of the University of Umeå about a test of the Swedish plagiarism detection software Urkund. CERUM is the Center for Regional Studies at the University of Umeå.

The university purchased a license for Urkund for four departments at the school for a year and had CERUM look at the usability, effectiveness, and moral problems associated with the use of the software. They interviewed teachers and students prior to and after use and describe very clearly the issues found. Among them:
  • Teachers found the system easy to use, as students send their papers to an email address at Urkund. They then send on the paper to the teachers and later send a report on anything found to the teachers. This does, however, pose a problem as student's email addresses are used as the subject line and were often difficult to connect up to real names.
  • Teachers felt that the system was effective.
  • Using the system didn't save any time, but they feel that it is important for them to deal with plagiarism.
  • Teachers and students alike felt that using the system worked as a deterrent to plagiarism.
  • Students were generally happy with the software and had no moral or ethical problems, as they felt that everyone was being handled equally, something that is very important for Swedes.
  • There was no mention made of the copyright problems. Urkund keeps a copy of all papers unless the students answer their acknowledgement email and request their paper not be put in a database.
  • Neither teachers nor students felt that there was a problem in their relationship with each other based on the use of this system.
In order to test the effectiveness of the system, CERUM requested the four departments give them some common texts from textbooks, online resources, and magazines from their field, as well as from the Internet. Urkund says that it checks the Internet, many publications (including the Swedish National Encyclopedia), and of course their own database.

The researchers added material on their own in order to have 20 sources for each department. They then constructed test material (it is not clear from the report if they put all into one text or made a few papers, the report states differently in different chapters) and ran it through Urkund. Only 18 of the 80 sources were found by Urkund, a rather sorry result.

While the investigation was being done, it was discovered that there were a handful of teachers at the school using the Genuine Text system. This Swedish system, which according to their web page is used in Sweden, Denmark, Russia, and some parts of Africa, only searches the Internet and its own database. It is a web-based system that has three ways of submitting: students upload a file, teachers upload files, or they copy and paste material into a field. They offer statistics for administrators and plagiarism reports that are suitable for submitting to the appropriate disciplinary bodies. It is not clear from the report if students have a way of opting out of having their material stored in the database.

CERUM was commissioned to have a look at this as well, and they interviewed the teachers and students who had similar responses as the Urkund subjects. Interestingly enough, although Genuine Text does not test publication data bases, they managed to score 22 out of 80 plagiarisisms found. There were only 14 plagiarisms from the Internet, Genuine Text found 7 of them.

Both reports lead me to the following conclusions:
  • The software proved just as ineffective in these tests as in mine. They find about half of the Internet sources and very little else.
  • Teachers are so happy to have something found, that they believe it to be effective.
  • The major use of plagiarism detection software is in deterrance. I found many forums in which students wondered how good the systems really were and what their chances for being found out were. There were tips being given on changing around words so as to confuse the system. Unfortunately, this works for many software systems, although Google can often deal with changed word order.
  • No one considers copyright (or patent!) issues in student works, universities and companies keep copies without explicit permission, as would be necessary by EU copyright law.
  • If the students are informed properly (orally and in writing) about the use of the software, they are happy about it being used.
I am attempting to contact the authors, who are no longer with the department, to see if I can find further information about the study.


  1. Best of luck getting in touch with them, definitely let us know what you find out!

  2. Not very scientific to base criticism on a three year old test... It doesn't take a rocket scientist to realise that a lot can happen with software in three years. I'm sure the physics department at Lund have one or two they could ask if in doubt though. There are probably good reasons that the other company, Genuine Text, now are doing their business in Russia and Africa

  3. I agree, for my knowledge main part of Universities in Sweden are using Urkund. Therefore I think there must be plenty of tests that are done in more current years at Swedish Universities that shows that Genuinetext level of functionality and quality is evidently unsatisfactory in contrast to Urkund performance. Test done at Chalmers Tekniska Högskola is one of these. One would expect that Debora Weber-Wulff would refer to reports of current interest.

  4. Anonymous, do you have a reference to the Chalmers study? I do read Swedish and would love to summarize the results for my English-speaking readers.

  5. A note of interest on the matter of quality and user friendliness in anti-plagiarism systems:

    The word on the grapevine is that the University of Gävle has cancelled their subscription of Genuine Text as of end of May.

  6. Hello WiseWoman!

    Plagiarism is a type of cancer and should be overcome.

    But where in the world there is a central point/database where the student and academic works are stored? EU has no such point and that is not a good sign. I think it should be an initiative to be taken for such a central point. Physically the data can be distributed in many places, but this data should create one corpus. Stupid?