Friday, October 26, 2012

Hamburg doctorate rescinded, court case pending

Apparently, the University of Hamburg rescinded the doctorate in law of one of cases documented on the VroniPlag Wiki at some time in the recent past, as reported by the Hamburger Abendblatt. They report on the complexities surrounding this case that involve lawyers suing lawyers over lawyers fighting other lawyers. But no matter what the circumstances -- a thesis that has so many text parallels, often covering more than half of the page on over 86 % of the pages,  is extremely problematic. The findings page lists the most important text parallels.

Monday, October 22, 2012

Frankfurt School of Management rescinds doctorate

The Frankfurt School of Management, a private institution of higher learning in Germany that confers doctoral degrees, has announced that they have rescinded the doctorate from a candidate, the first degree awarded at this school, who had plagiarized on 94 of the 380 pages of his thesis, as well as many other problems with the thesis. The Frankfurter Allgemeine Zeitung reported on the case, but the article is behind a paywall.

Wednesday, October 17, 2012

How to find Plagiarism in Dissertations

Germany is awash in another wave of discussions about plagiarism. This time it is the Minister of Education and Research, Annette Schavan. The story about plagiarism in her dissertation broke in May, and the University of Düsseldorf has been examining the case since. Today, October 17, the committee is meeting to decide on the results, but the documentation that they prepared was leaked to the press this past weekend, and the press has been in a frenzy.

And I have laryngitis and can't talk. I have journalists pleading with me to explain how the "magic" VroniPlag Wiki software works. The problem is, there is no magic software. The method used to find plagiarism in dissertations (or any other written work) is called "research". Just normal research.

But since so many people need to know how this is done, here's a crib sheet with 10 easy steps:
  1. Obtain the thesis. If you are just trying to find the dissertation of a particular person who did their doctoral work in Germany, give the German National Library a try. Type in the name and see what it comes up with. Then use the catalog of your local library (often called an OPAC, online public access catalog) or a union catalog to try and locate a copy. Most German states have a union catalog, in Berlin it is the KOBV.  If there is none in your locality, you can obtain a library card and then have the thesis sent to you using inter-library loan.
  2. Read the thesis. There is no royal road. The so-called plagiarism detection software can turn up the odd reference, but only if the sources are online. The best bet is to start reading it, and look for shifts in writing style, or places where the writing turns Spiegel-esque, or for sudden useless details, or misspellings, or just wrong content.
  3. Google. I've given up on other search machines. Just belly up to the search bar and type in three to five words from a sentence or paragraph and see what turns up. If you get a lead through Google Books, use step 1 to obtain a copy of the book. If you get lucky and the first paragraph is taken from the FAZ or the NZZ -- paydirt! Don't just try one paragraph, take a few from different parts of the book. 
  4. Follow the footnotes. University teachers do this when teaching their students how to footnote, and it scares the daylights out of students when they see that the professor found out that they were just making up the footnotes. Does the reference exist? Is the thing being said found on that page? Is the whole paragraph taken from the reference with the quotation marks "forgotten"? Does the chapter in the dissertation continue on after the footnote without a further reference? Is this paragraph perhaps just a translation of the reference? 
  5. Browse the bibliography. What is the most recent source used? Is it five years older than the dissertation? In some fields, this would sound an alarm. Is there some strange or obscure literature listed? Obtain it! Do you need journal articles? Germany had a wonderful listing of the holdings of all libraries nationwide, the Zeitschriftendatenbank. It will tell you where they can be found, and many can even be delivered to your email account as a pdf for a few Euros. Many libraries also subscribe to digital libraries that can be used when sitting at the library. A walk would do you good, anyway, so get over there and have a look.
  6. Digitize. If you have already found a source plagiarized in a dissertation, the chance is that there is more. Have a good look at each, and now digitize the relevant portions. Use a book scanner in the library to get a high-quality scan of the pages as a PDF. You lay the book flat under the camera, press a button, turn the page, press a button, until you are done. Experienced scanners can do over 100 200 pages per hour. Now use an optical character recognition (OCR) software on the PDF. There are free ones like Google's Tesseract or professional versions such as the one built into Adobe's Acrobat or OmniPage or Abbyy Fine Reader.
  7. Compare. This is one if the few software systems the VroniPlag Wiki people use. It is a text comparison tool that is based on the free algorithm of Dick Grune. The tool marks identical passages in two documents that it is comparing. Put the dissertation in one side, the source in the other, and press "Texte vergleichen!". Don't forget to make a screen shot if the results turn out colorful.
  8. Document. If you find anything, document it exactly. Page and line numbers from the dissertation, URL or page and line numbers from the source, and a copy of each. A two-column side-by-side has proved easy to understand when showing the results to others.
  9. Need help? If you have already found some nasty text parallels, drop in at the VroniPlag Wiki chat ( on channel #vroniplag) or use the drop if you want to be discreet. You might be able to interest someone in working on the case. But remember, they are all volunteers. Or you can continue on yourself, and then inform the ombud for good scientific practice at the university in question.
  10. Publish. If you feel that it is necessary to publish your results, you can either choose a wiki, such as the GuttenPlag Wiki or the VroniPlag Wiki, which makes it easier for others to help you with the documentation, or you can publish on a blog, like the SchavanPlag blog, which gives you complete control of what is published. Or you can print up a book, like Marion Soreth did in 1990 when she documented the dissertation of her colleague Elisabeth Ströker. 
All clear? If I've missed anything, please add in the comments!

Update: A correspondent noted that when you scan once you actually get two pages. And that an experienced person can do 200-400 pages an hour. I'll stick with the lowest number.
Update 2: A student of mine improved the comparison tool. It is available as similarity-texter free of charge online.  It runs in your browser. If you find something colored, just click on it - and the identical text on the other side will align with it. This makes examining large files very easy, and it is also easy to prepare a PDF with the passages marked using the tool. 

Friday, October 12, 2012

Stumping Plagiarism Software

A correspondent shared an email correspondence with me he had with Ephorus, the Dutch plagiarism detection software company. It seems that his school pays good money for the Ephorus system for general use.

Although Ephorus had given a student's paper a clean bill of health, the professor had not been satisfied and she sat down to google. She found over 30 % of the paper was plagiarized from online sources!

They wrote to Ephorus to ask how this could be. The answer is rather shocking: the texts aren't identical, you see. The punctuation was changed, and the student paper often had two blanks where the source only had one. Ephorus wrote:
The erroneous punctuation has implications for the effectiveness of the plagiarism scan. we [sic] will examine how large the effects are and what we can do about it.
Um, guys? If your system can be tricked by inserting a blank after every second or third word, we might just as well flip a coin to determine if a paper is plagiarized. This does, however, confirm that the false negatives are a big problem with Ephorus. In our study with former German defense minister Karl-Theodor zu Guttenberg's doctoral thesis, which was determined by the GuttenPlag Wiki to have 63 % of the lines on 94 % of the pages to be plagiarized, Ephorus reported only 5 % plagiarism:

A French Puzzle

An anonymous correspondent dropped this link into my box this morning: Imposture à l'Université ?

Google Translate lets me know that this is a bit of a French puzzle. Professor Imad Saleh of the University of Paris 8, lists as an important paper in a CV:
Meziani Rachid et Saleh Imad (2011), « Towards a collaborative business
process management methodolgy [sic] », ICMCS ’09, IEEE, 6-8 April 2011
Maroc, 8 pages (article indexé).
That is, a paper from the 2009 conference ICMCS sponsored by IEEE in 2011. Okay, that might be a typographical error. The ICMCS'11 did take place in Morocco, but from 7-9 Apr 2011. Okay, off-by-one is normal for computer scientists.

The article posts a link to that paper. And it posts a link to a paper written by Rachid Meziani and Rodrigo Magalhães from the Center for Organizational Design and Engineering in Lisbon, Portugal in 2009: Proposals for an Agile Business Process Management Methodology.

Shall we compare the abstracts with the VroniPlag Wiki SIM_TEXT comparison tool?

(You can click on the picture for a larger view)

Needless to say, the article continues pretty much word for word, table by table, picture by picture.

Saleh is professor and the director of PARAGRAPHE, an interdisciplinary research laboratory attached to the doctoral School (N°224) Cognition, Langage and Interaction (CLI) of the University of Paris 8. There is no Meziani listed there or at the web site of the University of Paris 8. There is a Rodrigo Magalhães to be found in Kuwait, and he does BPO, but there is not a complete bibliography listed there.

So the French Puzzle is: why are these two papers identical? What happened to Meziani and Magalhães? There has been a case submitted to the French Council on Universities. It is interesting to note that Saleh is a member of that council

And if I may add a question myself - why do we continue to prize conference publications in computer science? We can't tell the mock conferences from the substantial ones, and plagiarism seems to be rampant because the peer-review systems is dead for conferences.

Thursday, October 4, 2012

Danish eLearning Unit on Avoiding Plagiarism

Three Danish universities, the University of Southern Denmark, Aarhus University, and Copenhagen University cooperated in 2010 to produce an eLearning unit on avoiding plagiarism. There is a version in Danish and in English.