Showing posts with label software. Show all posts
Showing posts with label software. Show all posts

Monday, February 24, 2020

Testing of Support Tools for Plagiarism Detection

It's out! Our pre-print about testing support tools for plagiarism detection, often mistakenly called plagiarism-detection tools. The European Network of Academic Integrity Working Group TeSToP worked in 2018 and 2019 to test 15 software systems in eight different languages. Of course, everything has changed since then, the software people let us know, but whatever: here's the pre-print, we have submitted to a journal.

arXiv:2002.04279 [cs.DL]

Testing of Support Tools for Plagiarism Detection

 
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic. 

Thursday, December 17, 2009

Microsoft Admits Plagiarizing Code

Microsoft has admitted plagiarizing code for the Chinese interface for their MSN-Buddy.

Is this the current trend? Plagiarize, and then apologize if caught? I hope not.

Thursday, September 25, 2008

Three words suffice!

I am currently testing about 20 plagiarism detection systems (PDS). During one of the tests we saw a very nice turn of phrase that we had plagiarized from one newspaper site (with permission!): "paranoide vorolympische Kraftmeierei" (paranoid pre-olympic muscle-flexing).

The system did not find our source, the Süddeutsche Zeitung, but the Swiss Tagesanzeiger. Putting just these three words into Google proved something I have been saying all along: three to five words suffice.


As it happens, the author, Henrik Bork, is the author of these identical articles. He sold one in March, one in April. The ethics of this is another discussion. But the use of PDS is so time-consuming, one really just needs to pick out phrases like this upon reading, and use a search engine. Full stop.

Wednesday, September 3, 2008

Plagiarism Detection Software Test 2008

We are currently conducting our Plagiarism Detection Software Test 2008. The current field of candidates is:
  • turnitin
  • Ephorus
  • Plagiarism-Finder
  • Docoloc
  • Urkund
  • StrikePlagiarism
  • TextGuard
  • CopyScape
  • WCopyFind
  • CatchItFirst
  • SafeAssign
  • ArticleChecker
  • JPlag
  • PaperSeek
  • YAPLAF
  • AntiPlag
  • PlagAware
  • PlagiatCheck
  • PlagiarismDetector
This year we are not only testing the quality of the plagiarisms found, but also assessing the usability of the systems, awarding points in the categories of information, cost transparency, layout, readability of the reports, navigation, and integration in the teaching workflow.

If you have plagiarism detection sofware you would like to have tested, please leave a link here or contact me. We will publish our results on September 30, 2008.

Saturday, February 2, 2008

A Legal Twist on Plagiarism Detection

A representative of a German university asked for my help this past week. They want to purchase plagiarism detection software, but their legal department insists that they can only purchase software that they run locally, they may not send the papers to a third-party company for testing, as the papers are examination artefacts and not to be used outside of the university.

This gives plagiarism detection an interesting twist: software that runs locally does not normally have its own database - so it basically is just doing the search machine searches for you, in which case you might as well be doing the testing yourself. It is conceivable that a university might start a papers database of the locally submitted papers, but that will only be of marginal use, as copying from the Internet would not be found.

I have heard that locally installed plagiarism detection software has trouble negotiating licenses with large search machine companies for fast, repeated searches. So maybe what we need is some sort of Plagiarism Workbench that helps teachers do their searches themselves, recording what they tested when and helping them do documentation.

But it seems there is no substitute for doing one's own searching. Since we are, one hopes, actually reading all the papers and not just assigning random grades, we might as well do a quick check after reading on a few paragraphs. As I have often shown: 3-5 nouns suffice.

Tuesday, January 8, 2008

BitScan

PlagiarismToday alerted me to a new software for plagiarism detection, BitScan. Since it just wanted URLs and 20 tests were free, I couldn't resist trying a few of my tests on the quick:
  • Viking - a trivial shake-and-paste plagiarism of one source: the source and only the source was found
  • Döner - a complicated three-source plagiarism with Wikipedia: two mirrors of Wikipedia found, no other sources
  • Jelinek - another three-source plagiarism with an automatic translation: one of the three sources found as the only source
  • Djembe - an impossible (for machines) machine translation: nothing found
  • Lettau - an easy plagiarism of the German Wikipedia (his publication list also appears 1:1 in the English-language Wikipedia: nothing found
  • Blogs - a plagiarism from a pdf: nothing found
  • Atwood - a trivial plagiarism from Amazon: found, along with some copies
Okay, that's about par for the course. Flipping a coin is at least as good. I will put it on my list for a future test, though. Maybe they will have done some fine-tuning by then.

Thursday, October 4, 2007

Another Plagiarism Detection Test

The JISC (Joint Information Systems Committee) in the UK has just released their study of plagiarism detection systems: http://www.jiscpas.ac.uk/documents/resources/PDReview-Reportv1_5.pdf (September 2007).

It was interesting to see that they tested some of the same systems we did, often having similar experiences with the systems although they only tested a few cases using computing-related texts. They did, however, have a very fine-grained points system looking at things such as legal issues and the technical basis for the server systems and the presence of licenses for using search machines such as Google or Yahoo.

Not surprisingly, Turnitin comes out on top. Why do I say "not surprisingly"? Well, JISC seems tightly entwined with the NorthumbriaLearning and the latter are the European re-sellers for Turnitin as well as a resource center for teachers. I am not quite clear on how close these two organizations are.

JISC did give this survey to an outside person to conduct, and had an academic advisory board look at the evaluation questions and suggest products to test. But the appendix entry on turnitin is a glowing sales document that avoids all of the issues with Turnitin (such as being overeager to store copies of papers in their database), whereas the others are more apt to have problems noted - problems that we, too, had in many cases.

The survey is still a very valuable collection of data - all the more so because they used questionnaires to elicit more data (or more refusals to give information) from the various companies. I am just curious as to how independent the study really is.

Update October 5, 2007: William Murray from NorthumbriaLearning has sent me this clarification of the relationship JISC/NL. Thanks, William, glad to post it!

"The relationship between JISC (the government funded Joint Information Systems Committee in the UK) and NL needs explaining. The confusion occurs because all JISC services are branded with JISC in front of them. We run JISC-PAS not JISC!

Turnitin won a national tender in 2002 put out by JISC to run a national detection service in the UK and Northumbria University (our original parent company) won a second national tender for the advisory service JISC-PAS (Plagiarism Advisory Service) that supports it.

We (Northumbria Learning) have been managing JISC-PAS and reselling Turnitin ever since with JISC’s endorsement. JISC wanted an independent survey to reaffirm (or otherwise) their support for their original choice of detection solution in 2002. NCC Group Ltd were chosen because they are independent of NL and JISC-PAS.

Within JISC-PAS our primary aim is to encourage holistic change within institutions through better information literacy, better course design, better research practice and better teaching of core skills. We happen to think that solutions like Turnitin provide the ‘ah-ha’ moment (Jude Caroll’s term not mine) that focuses the minds of all concerned. In my view detection is a change agent for better practice (I taught informatics at Northumbria University for ten years so I think this is a good thing. I would have loved to be able to use Turnitin, our class sizes were huge 300+ in some cases which made consistency in marking a nightmare). But specifically to address your points:

* JISC are not entwined with Northumbria Learning, we run the JISC-PAS and Turnitin service on behalf of JISC.

* NCC group ran an independent survey

* NCC group allowed *ALL* providers to vet *ALL* the information in their report and agree it as factually correct before publication.

* All providers were given the opportunity to improve their scores prior to publication

* The extent to which they contributed ‘sales’ information was entirely up to the companies concerned.

* Its aim was to identify which system could be deployed enterprise wide, with high volumes of through put and used on a national scale *in the UK* hence the questions about company stability and support in the UK.

* Having a central database was on of the reasons Turnitin was selected by JISC. This is why (in this context) it was not a flaw."

I think having classes of 300+ people do not constitute higher education, and that certainly contributes to the plagiarism problem!

Thursday, September 27, 2007

Test of Plagiarism Detection Software

It's finished, it's published. We worked feverishly right up to the wire. On Sept. 26 we sent copies of the preliminary reports (they were still in line for some language polishing) to the companies tested, so that they could prepare a statement, if they so chose.

We held a press conference this afternoon, cutting over to the new version of the plagiarism portal and the E-Learning unit on plagiarism detection ("Fremde Federn Finden", in German) at the start of the conference. We had 5 reporters in attendance and many who requested virtual press materials. The online magazine "Spiegel Online" had requested that we write a summary article for them, so we just cut out sleep for a few days in order to get it done.

We have had a lot of interest from reports and of course the companies tested. If we learn of other systems, we will be glad to test them as we have time (which will be spare time, as the financing for this project runs out tomorrow), although the results might not be comparable, as the Internet is constantly changing.

Here is a copy of the ranking page:

Ranking

Excellent Systems

No system was ranked as excellent - but there have been many people who attended plagiarism detection seminars who scored 100% on the same tests!

Good Systems

Nr. 1 : Ephorus

Acceptable Systems

Nr. 2 : Docoloc
Nr. 3 : Urkund, Copyscape (premium), PlagAware
Nr. 6 : Copyscape (free)
Nr. 7 : TextGuard
Nr. 8 : turnitin, ArticleChecker
Nr. 10 : picapica

Unacceptable Systems

Nr. 11 : DocCop
Nr. 12 : iPlagiarismCheck, StrikePlagiarism
Nr. 14 : CatchItFirst

We hope that our work can help these companies to produce better results. But our summary for 2007 is the same as for 2004: It is better to use a search machine yourself, the software just costs money and is not necessarily very good at finding all plagiarisms.

Friday, June 29, 2007

Test of Plagiarism Detection Software

I am currently redoing my E-Learning-Unit on plagiarism, "Fremde Federn Finden" (in German). As part of the work I am repeating the test that I conducted on plagiarism detection software in 2004. Then I used 10 papers that I wrote myself with a known amount of plagiarism / originality to see how well the software measured up. It was not a pretty sight - often flipping a coin was just as effective.

For the repeat of the experiment I have 10 more papers and will be conducting tests over the summer of the following products from various countries:
If any of my readers know of any other software, please let me know! If you are a software producer of plagiarism detection software, please contact me so that I can include you in the test.

The results will be published online in September 2007.

Tuesday, June 5, 2007

Plagiarism increased four-fold in Sweden

Svenska Dagbladet reports that the number of reported plagiarisms at the University of Södertorn, near Stockholm, has increased four-fold during the last five years - and that with the total number of students decreasing.

They cite a report put out by the Hogskoleverket, the government university agency, which will be published next month. The report finds more plagiarism in term paper writing than in cheating on exams. Under the Swedish system, students who are caught cheating or plagiarizing are brought before a board, the disciplinnämnden, which decides if punishment should be meted out. Punishment is suspension from school for a period of up to 6 months - usually pronounced just before exam time, so that the deliquient cannot take some exams.

Taking exams in Sweden is vital - if you pass 75% of the credits of your first year at college, you can get funding for the next year, and so on. So there is quite an incentive to get those 75% credits.

The university uses a so-called plagiarism detection software for checking term papers, the article does not mention which one. Out of 15 400 submitted papers last year there were 36 suspensions meted out. 2003 there were only 10 suspensions pronounced. That is a quota of 0.23 % - and far, far below what teachers report when they hand-check term papers. There are reports accumulating pointing to figures more in the 10-30% range.

The report continues that 3 of the suspended students took the university to court - and won their suspension rescinded. That looks to me like an 8% false positive rate in the software. Perhaps they need to look hard at their software, or find other methods - like using search machines - for assessing this problem.

You can't solve social problems with software - and most certainly not with software that is this bad.

Sunday, March 4, 2007

Docoloc

A German professor, Martin Gutbrod, wrote a little software system called "Docoloc" in order to fight the plagiarisms he was finding. The media are singing the praises of this software, although in the test I did of the software in 2004 did not give the software any prizes - it was only able to correctly determine whether an essay was a plagiarism or not in 6 of 10 tests (after some problems getting it to run). I will be repeating this test this summer, more on that to come.

I find it troubling, though, that software that purports to fight plagiarism itself uses a layout that is a blatent plagiarism of Google's layout.....

Tuesday, December 26, 2006

Plagiarism "finding" software

The German computer mag c't has an article in number 1/2007 (Plagiatfinder: Prüfzwang für Studienarbeiten, S. 78) about software for "finding" plagiarisms. Oh well, at least I am correctly sited: It is useless to try and solve social problems with software.

But the article is still quite euphoric about using software. Sigh. My tests in 2004 were not encouraging - often, you could just flip a coin an be just as right about whether a paper was plagiarized or not. But many companies scream now "We are NEW! We are IMPROVED!", so I am forced to spend my summer term's research allowance (all of 4 hours a week out of 18 off teaching to do research) in order to repeat the tests. Stay tuned for the results after the summer break 2007.

Until then: just use a search machine and your brain. You will get better results.