Wednesday, October 17, 2012

How to find Plagiarism in Dissertations

Germany is awash in another wave of discussions about plagiarism. This time it is the Minister of Education and Research, Annette Schavan. The story about plagiarism in her dissertation broke in May, and the University of Düsseldorf has been examining the case since. Today, October 17, the committee is meeting to decide on the results, but the documentation that they prepared was leaked to the press this past weekend, and the press has been in a frenzy.

And I have laryngitis and can't talk. I have journalists pleading with me to explain how the "magic" VroniPlag Wiki software works. The problem is, there is no magic software. The method used to find plagiarism in dissertations (or any other written work) is called "research". Just normal research.

But since so many people need to know how this is done, here's a crib sheet with 10 easy steps:
  1. Obtain the thesis. If you are just trying to find the dissertation of a particular person who did their doctoral work in Germany, give the German National Library a try. Type in the name and see what it comes up with. Then use the catalog of your local library (often called an OPAC, online public access catalog) or a union catalog to try and locate a copy. Most German states have a union catalog, in Berlin it is the KOBV.  If there is none in your locality, you can obtain a library card and then have the thesis sent to you using inter-library loan.
  2. Read the thesis. There is no royal road. The so-called plagiarism detection software can turn up the odd reference, but only if the sources are online. The best bet is to start reading it, and look for shifts in writing style, or places where the writing turns Spiegel-esque, or for sudden useless details, or misspellings, or just wrong content.
  3. Google. I've given up on other search machines. Just belly up to the search bar and type in three to five words from a sentence or paragraph and see what turns up. If you get a lead through Google Books, use step 1 to obtain a copy of the book. If you get lucky and the first paragraph is taken from the FAZ or the NZZ -- paydirt! Don't just try one paragraph, take a few from different parts of the book. 
  4. Follow the footnotes. University teachers do this when teaching their students how to footnote, and it scares the daylights out of students when they see that the professor found out that they were just making up the footnotes. Does the reference exist? Is the thing being said found on that page? Is the whole paragraph taken from the reference with the quotation marks "forgotten"? Does the chapter in the dissertation continue on after the footnote without a further reference? Is this paragraph perhaps just a translation of the reference? 
  5. Browse the bibliography. What is the most recent source used? Is it five years older than the dissertation? In some fields, this would sound an alarm. Is there some strange or obscure literature listed? Obtain it! Do you need journal articles? Germany had a wonderful listing of the holdings of all libraries nationwide, the Zeitschriftendatenbank. It will tell you where they can be found, and many can even be delivered to your email account as a pdf for a few Euros. Many libraries also subscribe to digital libraries that can be used when sitting at the library. A walk would do you good, anyway, so get over there and have a look.
  6. Digitize. If you have already found a source plagiarized in a dissertation, the chance is that there is more. Have a good look at each, and now digitize the relevant portions. Use a book scanner in the library to get a high-quality scan of the pages as a PDF. You lay the book flat under the camera, press a button, turn the page, press a button, until you are done. Experienced scanners can do over 100 200 pages per hour. Now use an optical character recognition (OCR) software on the PDF. There are free ones like Google's Tesseract or professional versions such as the one built into Adobe's Acrobat or OmniPage or Abbyy Fine Reader.
  7. Compare. This is one if the few software systems the VroniPlag Wiki people use. It is a text comparison tool that is based on the free algorithm of Dick Grune. The tool marks identical passages in two documents that it is comparing. Put the dissertation in one side, the source in the other, and press "Texte vergleichen!". Don't forget to make a screen shot if the results turn out colorful.
  8. Document. If you find anything, document it exactly. Page and line numbers from the dissertation, URL or page and line numbers from the source, and a copy of each. A two-column side-by-side has proved easy to understand when showing the results to others.
  9. Need help? If you have already found some nasty text parallels, drop in at the VroniPlag Wiki chat or use the drop if you want to be discreet. You might be able to interest someone in working on the case. But remember, they are all volunteers. Or you can continue on yourself, and then inform the ombud for good scientific practice at the university in question.
  10. Publish. If you feel that it is necessary to publish your results, you can either choose a wiki, such as the GuttenPlag Wiki or the VroniPlag Wiki, which makes it easier for others to help you with the documentation, or you can publish on a blog, like the SchavanPlag blog, which gives you complete control of what is published. Or you can print up a book, like Marion Soreth did in 1990 when she documented the dissertation of her colleague Elisabeth Ströker. 
All clear? If I've missed anything, please add in the comments!

Update: A correspondent noted that when you scan once you actually get two pages. And that an experienced person can do 200-400 pages an hour. I'll stick with the lowest number.
Update 2: A student of mine improved the comparison tool. It is available as similarity-texter free of charge online.  It runs in your browser. If you find something colored, just click on it - and the identical text on the other side will align with it. This makes examining large files very easy, and it is also easy to prepare a PDF with the passages marked using the tool. 

