Saturday, December 20, 2014

Christmas Links

I seem to be getting more and more links I can't adequately deal with, but which I don't want to withhold from readers. So here is some Christmas reading:
  • The "Neurosceptic" blog of Discover Magazine has a piece about The Strange Case of “Publication Integrity and Ethics” which details a number of integrity and ethics questions around the supposed new journal.
  • The Times Higher Education has a piece on post-publication peer-review that describes more of the chilling consequences that occur when lawyers meddle with scientific inquiry. Physics professor Philip Moriarty is quoted with: “If you are publicly funded and you put your research into the public domain but no one can criticise you for it without facing legal proceedings, that seems to me to be a very badly damaged system.” Exactly.
  • Retraction Watch obtained a $400.000 grant to set up a retractions database! This is great news, I hope that the database can be used to calculate a Retraction Index, that is, how many retractions per article published a journal has, and perhaps how long did it take for the retractions to take place after the initial information of the journal.
  • Bernd Kramer recently published a book in German about obtaining a doctorate in Germany without doing the work ("Der schnellste Weg zum Doktortitel. Warum selbst recherchieren, warum selbst schreiben, wenn's auch anders geht?"). The cover is a horrible stock photo, but the book makes quite interesting reading. Kramer gave an interview in Deutschlandradio in November 2014 about it.
  • Reports of fake peer reviews are increasing. Vox has an article about 110 papers retracted in the past two years on account of faking peer reviews. Retraction Watch reported on SAGE publishers retracting 60 papers from just one journal for this reason. The Minister of Education in Taiwan, Wei-ling Chiang, had been added to some of these papers as a co-author (he says without his knowledge). He stepped down because of the scandal in July 2014, according to IEEE Spectrum
  • Taipeh Times reported in August of 2013 that Andrew Yang, the former Taiwanese Minister of National Defense was forced to resign in a plagiarism scandal a few days after taking office. He had published a book in 2007 that friends had ghostwritten for him. They had, however, plagiarized large parts of the book.
  • The University of Nevada in Las Vegas fired an English professor for "serial plagiarism." The student newspaper, The Rebel Yell, also reports on the case.
  • End of November 2014 the Vice Chancellor of Delhi University in India was jailed and released on account of plagiarism.
  • There is a nasty case of plagiarism reported from early 2014 at the Chicago State University. The dissertation of the Senior Vice President and Provost of the university was being investigated, and the university confirmed to press that they were doing so. She sued the university for violating privacy laws, stating that she did not plagiarize [1]. There exist documentations of plagiarism in her dissertation in a blog ([2] - [3] - [4] - [5] - [6]). Despite the documentation, the University of Illinois, Chicago has ruled that her dissertation is not a plagiarism ([7]). The Chicago Tribune had three plagiarism experts (Tricia Bertram Gallant, Teddi Fishman, and Daniel Wueste look at the thesis ([8]). All three find the thesis problematic. The question is, are the students to be held to a different standard than the person who is enforcing that academic standard? A thorny question.

Monday, December 1, 2014

Diverse links

Here are some links that need documenting:
There will be more, I'm afraid, to come. 

A visit to the Academy

The Berlin-Brandenburg Academy of Sciences and Humanities invited me to speak this past week at a non-public meeting about plagiarism detection software for the working group Zitat und Paraphrase (Quotation and Paraphrase). I was a bit leery of speaking there, as some of the members of the group have publicly demonstrated a quite problematic interpretation of plagiarism as far as it concerned the dissertation of one particular person (see [1] - [2] - [3] for detailed online articles in German about this particular case, and a recent German essay [4] that compares this plagiarism case with one from the early 90s).

Since I do enjoy a good discussion, I agreed to speak. Unfortunately, the meeting was not open to the public, so I am only able to repeat the points of my presentation here, not the ensuing discussion. As it turned out, there were not many members of the group there, and none of the vociferous members I had been expecting.

I first made it exceedingly clear that VroniPlag Wiki is not a machine or software of any sort, but an academic community in which I take part. After discussing Teddi Fishman's definition of plagiarism, which I would extend to include "without properly attributing the work" in point 3 and removing point 5 entirely, I gave a few examples of some of the different forms of plagiarism. These were followed by screenshots of a few plagiarism detection systems that have complicated reports or report essentially meaningless numbers.

One important point that is often overlooked when using such systems is that they all suffer from both false positives as well as false negatives: This is an inherent problem with attempting to determine plagiarism using software. Quotations are difficult to detect reliably, especially if they are only indented; literature references should of course be similar to references used in other papers; and some systems begin to mark anything longer than 6 or 7 words as text similarity. All of these can be the source of a false positive, in addition to simple programming errors, which I have also seen. The other side of the coin is the false negatives, and they are quite simple to understand: If the software does not have access to a source, it will not be able to determine that it is indeed a source. Translated text, for example, is next to impossible to identify with software, as well as non-digitized content.

I then discussed the small, general tools that can be used to manually detect and document plagiarism. After a few examples of documented plagiarism from historic cases and from current cases at VroniPlag Wiki, I closed by asking some ethical questions that I include in my forthcoming chapter on plagiarism detection software for the "Handbook of Academic Integrity":
  • Is it necessary to find all the plagiarism in a text?
  • Is it ethical for a university to use plagiarism detection software?
  • Is it ethical for a university to use plagiarism detection software as a formative device?
  • Is it ethical for a university to offer plagiarism detection software for teachers to use?
  • Is it ethical for a university to offer plagiarism detection software for researchers to use?
We had a good discussion afterwards unfortunately, there was no time to linger on and talk further over a cup of coffee. I do hope that those who were present can serve as multiplicators, explaining to their peers that there is no magic silver bullet software for finding plagiarism, just a number of useful tools, large and small, that all incur a cost of time and effort to use.

[1] Causa Schavan (n.d.) Articles about "Zitat und Paraphrase." [Blog]. Retrieved December 1, 2014, from https://causaschavan.wordpress.com/?s=Zitat+und+Paraphrase
[2] Erbloggtes. (n.d.) Articles about "Zitat und Paraphrase." [Blog]. Retrieved December 1, 2014, from https://erbloggtes.wordpress.com/?s=Zitat+und+Paraphrase
[3] Dannemann, G. (2013, March 3). Die Ex-Ministerin und ihre Unterstützer: Schavanzentrisches Weltbild. Retrieved December 1, 2014, from http://www.tagesspiegel.de/wissen/die-ex-ministerin-und-ihre-unterstuetzer-schavanzentrisches-weltbild/7863836.html
[4] Ebert, T. (2014). Sag mir, wie hältst Du es mit dem Plagiat? Von Elisabeth Ströker zu Annette Schavan. Merkur, 68(12), 1070–1080.

Thursday, November 20, 2014

French journalism school executive suspended during plagiarism investigation

The Guardian reports that Agnès Chauveau, an executive from a journalism school in Paris, has been suspended for plagiarizing in columns that she published for the French-language web site Le Huffington Post.

The columns in question have been updated with a notice that the references have now been fixed:
Mise à jour: Ce billet est la reprise d'une chronique faite et lue chaque dimanche sur France Culture. Certaines références manquaient dès la version orale. Elles ont été ajoutées ici dès que ces erreurs ont été signalées afin que les citations et les sources apparaissent plus clairement.
The Institute of Political Sciences has launched an inquiry and suspended her during the inquiry.

Chauveau is said to have lifted material from various online and printed publications for her weekly radio show, then re-used the texts for her online column. Chauveau is quoted as having said that she had “forgotten to cite certain papers, but never on purpose”, and insisted: “I’ve rectified this each time there’s been a problem.” According to the Guardian, she is quoted as not having had the time "to cite all of her sources on the radio.”

In French media, there are articles in Liberation (with the quotations in French: «J’oublie de citer certains papiers mais ce n’est jamais volontaire et je rectifierai chaque fois que ça pose problème.» Elle a aussi expliqué qu’elle n’avait «pas le temps de citer à l’antenne toutes [ses] sources».) and Le Monde

Saturday, November 15, 2014

Münster tackles plagiarism problem head-on

The medical colloquium for advanced medical students in Münster, Germany, invited me to speak about plagiarism there on Nov. 15, 2014. VroniPlag Wiki has identified 23 medical doctoral dissertations from the University of Münster to date that have extensive text parallels that could constitute plagiarism, including one thesis with plagiarism on 100% of the pages. This was widely reported on in the local media, so they decided in addition to just inviting me to open the seminar for all members of the university, and invited alumni and the general public to attend as well.

Imagine 120-130 people in a typical medical school lecture theater with steep seating on a Friday afternoon at 4 pm. I was glad there was so much interest in the topic, and that the dean of the medical school, Wilhelm Schmitz, participated actively in the discussion.

After introducing the topic and noting that Münster has had plagiarisms documented in their school of law (Jam - Psc - Tr - Mb), in the political science department (Ahe), and a book published by a retired computer science professor withdrawn for extensive plagiarism from the Wikipedia (FAZ article), I pointed out that Münster had a case of a duplicate dissertation in 2011. At that time the dean had spoken of a singularity. Now, with 23 additional dissertations documented, it is clear that this is a systemic problem, not 23 additional singularities.

I spoke a bit about the history of doctoral degrees, drawing on the work of Ulrich Rasche (Geschichte der Promotion in absentia. Eine Studie zum Modernisierungsprozess der deutschen Universitäten im 18. und 19. Jahrhundert. In: R. D. Schwinges (Ed.) Examen, Titel, Promotionen – Akademisches und staatliches Qualifikationswesen vom 13. bis zum 21. Jahrhundert . Basel:Schwabe, pp. 275–352, 2007;  Mommsen, Marx und May: Der Doktorhandel der deutschen Universitäten im 19. Jahrhundert und was wir daraus lernen sollten. In: Forschung & Lehre , No. 3, pp. 196–199, 2013) and then briefly presented Bernd Kramer's theory about why medical doctors in Germany are so in love with their titles. This has to do with the history of the field, Kramer postulates.

Early on the clergy was also occupied with health matters. Pope Alexander III proclaimed in 1163 at the Council of Tours that the clergy was not to sully their hands with blood. Two professions sprang up to fill the void, the academic internal medicine scholars and the practical surgeons. More and more "specialists" and quacks sprang up touting their sure-fire cures for what ails you. The academic doctors were often personal physicians to the aristocracy, making house calls at the castle. Even though they were just a special sort of servant, they were learned doctors of medicine.

When Otto von Bismarck introduced free health insurance in Germany at the end of the 19th century, Kramer theorizes, there was a sudden change. The free health care was only if you went to a medical doctor with a diploma, not any of the various quacks. The "unwashed workers" now came to the doctor's surgery, and the doctors needed something to make them feel special. So that was the doctoral degree, according to Kramer, that was a symbol held dear that needed to be obtained at all costs. For the general public, the difference between a quack and a real doctor was that the latter had a doctorate from a university. So it came to be taken as a sign of quality. Kramer goes on to note that the law profession soon picked this up as well.

So the main reason for getting a doctorate in medicine in Germany was to have that symbol of quality on the nameplate, not an interest in research. The quality of many of the dissertations leaves much to be desired. Even the Wissenschaftsrat, normally a very reserved body, lashed out at the medical profession in 2004, but this was generally ignored. A discussion arose in 2009 when Ulrike Beisiegel, at that time the Ombud for good scientific practice for the German research funding organization and currently the president of the University of Göttingen, published an article (p. 488-9) about "Türschildforschung", research for the name plates.

She met with a lot of resistance in the medical field, including a flaming defense (p. 582-583) of the current medical practice by Dieter Bitter-Suermann, the president of the medical school in Hanover and the chair of the German medical school association. He focused there on the quantity of dissertations accepted and stressed how important it was that students start to understand research as early as possible.

I then gave some examples of the plagiarism in Münster. In my experience, people will talk about cases of plagiarism only on the basis of what they have read in newspapers, they seldom make the effort to actually look at the documentation that is available online. This was shocking for the audience, as it was utterly clear that what they were seeing was unacceptable. When I got to the data falsification that was found by accident, a sort of of "collateral damage" while documenting plagiarism, the anger in the audience was palpable. Text is often seen as not so important, but making up data is a major violation of research ethics.

The closing of my talk was about possibilities for changing the situation, moving to avoidance of plagiarism and inculcation of good scientific practice. The medical school in Münster is already moving to include obligatory courses in scientific writing and the scientific process for their medical students, and they are examining all the dissertations accepted in the past few years with plagiarism software, although I explained to them that due to false negatives they will not find all the plagiarisms. Dean Schmitz noted that it was indeed a lot of hard work to interpret the results, but that it was necessary to take care of this now and then to see how to avoid plagiarisms being accepted in the future.

He then opened the floor for questions, and a lively, hour-long discussion ensued. We touched on questions of the role of the advisors, of how to properly reuse descriptions of methods, on the question of having a dual doctorate program, MDs for all, PhDs for those interested in research. Doctoral candidates asked about how to go about avoiding plagiarism, what they needed to reference, and also wondering who these VroniPlag Wiki people are, anyway.

An important point came up in connection with the scandal over plagiarism in habilitations in Freiburg that recently moved Handelsblatt to print the front page headline "Dr. med. plagiat". I noted that Münster does not oblige their researchers to print their habilitations and they do not even have to deposit a copy in the library. I had tried unsuccessfully to obtain a copy of one that was referenced in a dissertation. The dean was surprised - he thought that they, too, had to be published. He promised to look into the regulations for habilitations and to insist on them, too, being publicly available. Even it is an accumulated habilitation with a number of published journal articles bound together with an explaining text, it has to be possible for any researcher to see which journal articles were used.

The session closed at 6pm, but a long line formed up front with people who had personal questions. One was on how to deal with their advisors publishing their own work, one was on where to find more information on scientific writing. The speaker of the student's group was concerned that I was making the University of Münster look bad by only showing examples from Münster. I assured him that I had chosen Münster examples for this talk only, I normally have other examples that I use. But since I had such a wide selection of plagiarized medical text from Münster, it was natural to use them. A radio journalist hung around to interview the dean, me, and a number of students. (Link)

On a final note, I feel that the universities need to be utterly transparent about how they deal with cases of plagiarism. The informer and the accused need to be heard by the investigating committee. They need to be informed about how the investigation is proceeding, and that needs to be timely. It is inacceptable for a plagiarism investigation to take more than a year (some are currently entering their fourth year, probably because the universities in question hoped that the problem would go away if they ignored it). Especially when the plagiarism documentation was raised publicly, as is the case both in printed book reviews as well as online documentations of text parallels, the university needs to publicly announce the results.

Since the doctorate was granted in public, it must also be publicly announced when it has been rescinded. That means naming the person. If they published a plagiarism, they have to accept the consequences. If the degree is kept and the grade lowered, or an expression of concern written, this needs to made public as well. The text parallels are visible to all, as both dissertation and source are published. The reasons why this is acceptable need to made clear: perhaps the plagiarism was the other way around, the supposed source may have been published first. If the reason for not rescinding a doctorate is that the advisor told the doctoral student to do so, all the more reason for it to be made clear that this advisor has problems with good scientific practice. Science does not thrive in secrecy.

The introduction of lawyers into the process of determining bad scientific process does not help, either. The publisher of the plagiarism should respond to the accusations, explaining why the texts are the way they are, not send lawyers to find possible problems in the process. The university grants doctorates, and the university can take them away again. And the government should quit putting the doctorate on identification papers that alone would do a world of good.

Wednesday, November 12, 2014

Chinese students in Australia use ghosting service

The Sydney Morning Herald and Western Australia Today are reporting on a Sydney company called MyMaster that is offering ghostwriting services to Chinese students enrolled in Australian universities. I've collected the links and the first paragraphs of the articles here. It is excellent to see such widespread reporting on academic misconduct.
  • WA's Curtin University caught in NSW 'essay writing' scandal
    "Western Australia's Curtin University has been caught up in a cash-for-results scandal involving thousands of students who paid a Sydney company up to $1000 each to write essays and assignments for them, as well as sit online tests." The article has links to other articles on grade changing scandals.
  • Students enlist MyMaster website to write essays, assignments
    "Thousands of students have enlisted a Sydney company to write essays and assignments for them as well as sit online tests, paying up to $1000 for the service. Their desire to succeed threatens the credibility and international standing of some of our most prestigious institutions."
  • Students buying assignments online could be charged with fraud
    "Students who pay essay writing services to complete their university assignments are not only breaching university plagiarism protocols but could also be charged with fraudulent conduct under NSW [New South Wales] legislation, legal experts say."
  • Yingying Dou: The mastermind behind the University essay writing machine
    "At the helm of the company embroiled in a large-scale academic cheating scandal is a Chinese-born businesswoman named Yingying Dou. The enterprising 30-year-old, who also goes by 'Serena', has used her accounting degree to build a lucrative ghostwriting service, called MyMaster, aimed at Chinese international students."
  • Yingying Dou takes the day off as students and tutors tell of others who cheat
    "Tutors and students at Yingcredible Tutoring, the coaching college run by the mastermind of essay-selling website MyMaster, Yingying Dou, have spoken of the widespread practice of international students paying for university essays as they struggle with language barriers."
  • Universities in damage control after widespread cheating revealed
    "NSW universities are in damage control following a Fairfax Media investigation that revealed hundreds of students across the state were engaging the services of an online essay writing business.
    On Wednesday, the Herald exposed an online business called MyMaster, run out of Sydney's Chinatown, that had provided more than 900 assignments to students from almost every university in NSW, turning over at least $160,000 in 2014."
The site has now been taken offline.
Thanks to Sven for spotting these articles!

Sunday, November 9, 2014

Short links

Here are some diverse and interesting links from the world of academic misconduct:
  • Research misconduct in Australia: The article in Mark Israel's Blog "The Conversation" lists a number of cases of research misconduct that have been made public in Australia, including a recent one at the University of Queensland.
    "Bruce Murdoch and Caroline Barwood resigned from the University of Queensland in 2013 after a whistleblower claimed that they had not undertaken an experiment on Parkinson’s, despite reporting results in various journals. [...] The university failed to find any evidence that the experiment had been conducted. Instead, it discovered duplicate publication, statistical error and misattribution of authorship."
  • The new president of the German "Federation of Expellees" organization, (Bund der Vertriebenen), Bernd Fabritius, is originally from Romania (he belongs to the German minority there) and did his doctorate in Hermannstadt/Sibiu and in Tübingen. A fascinating 54-page documentation of text parallels and other problems with this thesis was published recently online.
    The text was photographed using pens to mark the text and then boxes and explaining text were added to the pictures. A discussion of the documentation (in German) can be found in the Blog Erbloggtes.
  • The Berlin-Brandenburg Academy of Sciences in Germany considered awarding former Minister of Education Annette Schavan (who was found to have plagiarized in her dissertation) the Leibniz medal which is given in honor of outstanding service for the promotion of the goals of the Academy („zur Ehrung besonderer Verdienste um die Förderung der Aufgaben der Akademie“). Apparently, though, there was no unanimous vote, and the discussion leaked its way into the newspapers. There is also more biting commentary on the research group "Zitat und Paraphrase" (quotation and paraphrase) in the Causa Schavan blog ([1] - [2], in German)
  • Dr. med. plagiat: The German newspaper Handelsblatt has an extensive report on the plagiarism scandal in medicine at the University of Freiburg, the University of Münster and the Charité. 
  • There is a call for papers out (abstract submission deadline: November 16, 2014) for an international conference on plagiarism at the Mendel University in Brno, Czech Republic 10 - 12 June 2015 "PLAGIARISM ACROSS EUROPE AND BEYOND" (http://plagiarism.cz/ Disclosure: I am on the program committee).
  • I found an IFQ report (in German) from 2006 on the history of doctorates in Germany with some interesting statistics on the prevalence of doctorates in various fields.
  • It seems that Elsevier has been charging 30$ for copies of book chapters that consist only of one page containing the wording "This page intentionally left blank". A tongue-in-cheek systematic review has been published, and indeed, if one googles "This page is intentionally left blank" together with "site:http://www.sciencedirect.com" there are 55 hits across a wide spectrum of fields. Apparently, the automatic publishing system has trouble with blank pages, or else the blank pages were not caught during the rigorous peer review.
  • Widely off topic: There is even a Lego figurine for a university graduate in a cap & gown.

Saturday, October 25, 2014

intihal - Plagiarism in Turkey

Eurasian Institute Lecture Hall
I was recently invited to speak at a symposium organized by the Inter-Universities Ethics Platform and held at the Eurasian Institute of the University of Istanbul on October 17, 2014. They kindly organized two interpreters who took turns interpreting the talks given in Turkish for me, and my talk into Turkish for those who had need of it. Apparently, even in academic circles English is not a common language. I will describe the talks as far as I was able to understand them here. The conference was focused on intihal, the Turkish word for plagiarism.

The deputy rector of the Istanbul University welcomed the 60-70 people present (more would come and go during the course of the day), noting that he himself is the editor of an international journal that tests articles submitted for plagiarism. They reject half of the articles submitted for this reason.

The first speaker was Hasan Yazıcı, a retired professor of rheumatology who sued the Turkish government in the European Court of Human Rights and won. He first described his case, which was recently decided (April 2014) and is available online. Since he was speaking to a room of people who had followed the case more or less closely, he did not go into details, but they are given in the judgement:
In 1997 Yazıcı had informed the Turkish Academy of Sciences that a book by a Turkish professor (I.D.) and the founder and former president of the Higher Education Council of Turkey (YÖK) entitled Mother's Book was basically a plagiarism of the popular US book on rearing children by Dr. Spock, Baby and Childcare. In 2000 Yazıcı  published an article about the plagiarism in the Turkish Journal of Physical Medicine and Rehabilitation and a shortened version in a Turkish daily newspaper.

In the article Yazıcı praised YÖK for establishing a committee to examine the scientific ethics of candidates for associate professorships, and proposed that YÖK start the conversation about plagiarism by asking their founder to apologize for the plagiarism in his book. In response, I.D. filed charges against Yazıcı, stating that this publication violated his personality rights. In the following six years the case wound its way back and forth through the court system, with expert witnesses who were close colleagues of I.D. stating that they found no plagiarism in the book, but that the passages in question were "anonymous" information regarding child health and care and that this was a handbook without bibliography or sources, not a scientific work. Yazıcı was found guilty of defamation because his allegations were thus untrue and fined. Yazıcı challenged the selection of experts, and the Court of Cassation kept referring the case back to the lower courts. Again and again close friends were appointed experts, found no plagiarism, and thus Yazıcı was found to be guilty.
Yazıcı finally gave up on the Turkish courts, paid the fine, but took took his case to the European Court of Human Rights, stating that his right to freedom of expression—here stating that he found the book to be a plagiarism—had been interfered with and that the Turkish courts had not properly dealt with the case. He noted that due to the plagiarism, there was outdated information on baby sleeping positions in the book that had been updated by Dr. Spock in his 1998 edition, but was not changed by I.D. The European court found in its judgement that it is indeed necessary in a democratic society for persons to be able to state value judgements, which are impossible to prove either true or false. However, there must exist a sufficient factual basis, so the court (p. 13), to support the value judgement. In this case, the court found sufficient factual basis for the allegations, and ordered the fine paid by Yazıcı to be refunded and his costs for the court cases to be reimbursed.
Yazıcı made the point in his speech that the extent of plagiarism in a country correlates strongly with a lack of freedom of speech. He sees Turkey in the same league as China on this aspect. He noted that everyone knows about plagiarism, but no one speaks about it.
In order to decrease plagiarism we have to speak about plagiarism. He stated in later discussions that it is imperative that Turkish judges understand what plagiarism is, most particularly because there is a law in Turkey now declaring that plagiarism is a crime punishable by prison, but it is still not clear what exactly constitute plagiarism.

The second talk on "Plagiarism and Philosophy of Law" was given by Sevtap Metin. She described the Turkish legal situation, in particular the law of intellectual property. She noted that there are many sanctions for plagiarism, for example academics can be cut off from their university jobs or from funding. She also described the process for application for a professorship and noted that the committees are currently not doing their job in vetting the publications provided by the applicants. The reason for this is that if they note a suspicion of plagiarism that they cannot prove, they can be sued for defamation of character by the applicant. This discourages people from looking closely at publication lists. However, with Yazıcı recently winning his case in the EU, it must now be possible to speak freely about plagiarism. Citing Kant's categorical imperative, she feels that we must not plagiarize unless we want everyone to plagiarize. And if we tell our children not to lie, but lie ourselves, they will follow our actions and not our words.

The third talk was by Mustafa Kıcalıoğlu, a former judge now retired from the Court of Cassation, on "Plagiarism in Turkish Law." He spoke about the problems that occur in plagiarism cases in which personality rights have to be weighed against intellectual property rights. He noted that Ernst Eduard Hirsch, a German legal expert who taught at the University of Ankara, was instrumental in drafting the Turkish Copyright Act. Kıcalıoğlu went into some detail on copyright and intellectual property, I noted in the discussion that plagiarism and violation of copyright are not the same things: there is plagiarism that does not violate copyright law and violations of copyright law that are not plagiarisms. Kıcalıoğlu also discussed another long, drawn out plagiarism case of a business management professor who plagiarized on 65 out of 500 pages in a book. He was demoted from the faculty after YÖK found that he had plagiarized, and he sued YÖK, but lost. This person is now a high government official. The discussion on this talk was quite long and emotional, as many people in the audience wanted to relate a story or call for all academic institutions to take action against plagiarism.

After a lunch and tea break I photographed this fine stature of a dervish before we got into the technical part of the symposium. Altan Gürsel of TechKnowledge, the Turkey and Middle East representatives of iParadigms (the company that markets Turnitin and iThenticate), spoke about that software. He first gave the definition of intihal from the Turkish Wikipedia, showed a few cases of cheating that made the news, and then launched into the standard Turnitin talk. He did note, however, that the reports have to be interpreted by and expert and cannot determine plagiarism, so it appears that my constant repeating of this has at least been understood by the software companies themselves, if not all of the users of such systems. He reported on some new features of Turnitin, for example that now also Excel sheets can be checked, and Google Drive and Dropbox can be used for submitting work. In answering a question, he noted that YÖK now scans all dissertations handed in to Turkish universities with iThenticate, but not those from the past. They are planning on including open access dissertations in the future in their database.

I gave my standard talk on the "Chances and Limits of Plagiarism Software", noting that software cannot determine plagiarism, it can only indicate possible plagiarism, and that there are many false positives and false negatives. During questions a number of people were perplexed that there were so many plagiarisms documented in doctoral dissertations in Germany, since dissertations need to be original research and Germany has a reputation as having a solid academic tradition. They had only heard about the politicians being forced to resign, and wanted to know what was different in Germany that a politician would actually resign on the basis of plagiarism found in his dissertation. They wanted to know if judges in Germany understand plagiarism. I noted that indeed, they understand plagiarism much better than many universities and persons suing their universities because their doctoral degree have been rescinded. The judgements of the VG Cologne and the VG Düsseldorf are very clear and very exact in their application of law to plagiarism cases, as are the judgements in many other cases.



After a tea break Tayfun Akgül, a professor of Electrical Engineering at the Technical University of Istanbul and the Ethics and Member Conduct Committee of the IEEE spoke on "Plagiarism in Science." Akgül is also a professional cartoonist, with a lively presentation peppered with cartoons that kept the audience laughing and caused the interpreters to apologize for not being able to translate them. He outlined the IEEE organizations and policies for dealing with scientific misconduct on the part of its members. He spoke at length about the case of Turkish physicists having to retract almost 70 papers from the preprint server arXiv. Nature reported on the case in 2007, the authors complained thereafter that they were just borrowing better English.


Özgür Kasapçopur, the speaker of the ethics committee of the Istanbul University gave the facts and figures of the committee itself and the cases that it has looked at since it was set up in 2010. They have had 29 cases submitted to the committee, but only determined plagiarism in 3 cases.



Nuran Yıldırım spoke about YÖK and plagiarism. She is a former prefect who was on the ethical boards of both the University of Istanbul and YÖK. The Higher Education Council was established in 1981. From 1998 plagiarism was added to the cases that are investigated there, as plagiarism is considered a crime that can incur a sanction. However, there was only a 2 year statute of limitations in place. This has been since removed, and all applications for assistant professor need to be investigated by YÖK. If they find plagiarism, they have a process to follow and if plagiarism is the final decision, the person applying for a professorship is removed from the university. However, this harsh sentence has now been changed to "more reasonable punishments", whatever that is. She noted that at small universities it is hard to have only a local hearing, as often the members of the committee to investigate a case are relatives of the accused. She had some fascinating stories, especially from the military universities, including one about a General Prof. Dr. found to have plagiarized. She also noted that people do accuse their rivals of plagiarism just to try and get them out of the way. Her final story was about someone who published a dissertation, and eventually found that all of his tables and data were being used in a paper by someone else. He informed YÖK, and the second researcher defended himself by saying that he had used the same laboratory, the lab must have confused the results and given him the results from the other person instead. YÖK then requested the lab notebooks from both parties, only the author of the dissertation could produce them. Since the journal paper author couldn't find his, he was found guilty of plagiarism.

In the final round, İlhan İlkılıç, a professor of medical ethics at the University of Istanbul, on leave from the University of Mainz and a member of the German national ethics committee, presented a to-do list that included setting out better definitions of plagiarism and academic misconduct and finding ways of objectively looking at plagiarism without personal hostilities or ideologies getting in the way. Discussion about plagiarism is essential, even if it won't prevent plagiarism or scientific misconduct from happening.

Sadat Murat, chairman of the Turkish national ethics committee, spoke about their work which is to investigate complaints about state servants. However, exempt from this are low-level state servants, as well as the top-ranking politicians. They only report on violations, however, they cannot sanction. They also try to disseminate ethical culture in Turkey by providing ethics training.

I especially want to thank the interpreters for their work—any errors here are mine for not paying exact attention, they did a great job permitting me to understand a small portion of what is happening in the area of intihal in Turkey.

Monday, October 6, 2014

Belgian Rector resigns over plagiarized speech

The rector of the Free University of Brussels in Belgium, Alain Delchambre, gave a speech on the opening day of the academic year on Sept. 19, 2014 that turned out to have been heavily plagiarized from a number of sources, among them former French president Jacques Chirac, according to media reports (French: Le Monde [paywall], La Libre [with a good synopsis of the plagiarized portions], Flemish: Staandard ). The speech was written by a speech writer who was summarily fired on the spot.

German Spiegel Online reports that Delchambre has resigned, as the university takes a hard line against plagiarists among the students, and Delchambre felt that this step was in the best interests of the institution. Of course, it took a media outrage to encourage him to take this step, but this is, perhaps, a warning signal to others: If you must use a speechwriter, make it clear that plagiarism (from the Wikipedia or elsewhere) is not going to be tolerated.


Tuesday, September 16, 2014

Quick PhD

The newspaper "The Herald", owned by the state of Zimbabwe, reports that first lady Grace Mugabe was awarded a PhD in sociology by the chancellor of the University of Zimbabwe, her husband and ruler of the country Robert Mugabe. The Guardian, enhancing the story with many details, points to The Standard's quite critical report. It seems that Mrs. Mugabe's first degree, in Chinese, was awarded in 2011 on the basis of a correspondence course from the People’s University of China.

It is not clear how the former secretary, who apparently dismally failed a Bachelor of Arts program at the University of London in 2001, completed the necessary coursework in sociology, conducted the research, and wrote the thesis. The registration for the degree happened just a few months ago. One does hope that the thesis will be published, so that the scientific community can have a closer look at the research - and the writing. 

Monday, September 15, 2014

Montenegrin minister caught plagiarizing, another German minister steps down

Retraction Watch reports today that the Montenegrin daily newspaper Vijesti has reported that the science minister of the former Yugoslavian state, Sanja Vlahovic, has been caught plagiarizing. And a paper that supported her "election" as professor at the private Mediterranean University that she supposedly published in the Emerald Publishing Group's International Journal of Contemporary Hospitality Management can't be found. I couldn't find the paper, entitled "Destinations Competitiveness in Modern Tourism", in this journal either. There are calls for her to step down, as science ministers should not be plagiarizing.

The German Spiegel Online today reports that the state minister of education, Waltraud Wende, has stepped down over a long, drawn out spat about an accusation of bribery against her. While she was president of the University of Flensburg, she is accused of having offered to support the chancellor of that university for re-election if he promised her a chair at the university for her to return to when her term of office is up. She had been a professor in Groningen (Netherlands) when elected president of the university, so she had no own chair in Flensburg. There were searches of her private home and the university conducted by the police in this matter.
 

Tuesday, September 2, 2014

Homebrew Collusion Detection

tl;dr -- One can use free tools to identify collusion, a special sort of plagiarism, but there is still much manual work involved.

[Note: I promised this post 3 months ago - then life and a lot of dissertations got in the way. Sorry for the delay. --dww]

In a previous blog post I described the situation the University of Münster is currently facing with at least 23 dissertations in medicine documented as containing massive text overlap from dissertations submitted to that same university or other universities in previous years. The renowned Charité Medical School in Berlin is currently at 20 dissertations in medicine with massive text overlap, the number there is steadily rising.

This re-use of text (and images or even data) from the same department can be considered to be collusion, a special form of plagiarism. When looking at questions of collusion, there is a closed number of documents that are to be compared with each other, for example all of the dissertations from one department. Text overlap is much easier to find in a closed set of documents than finding a potential source somewhere on the internet.

How were these theses with such extensive text overlap identified? It has been postulated that VroniPlag Wiki has some sort of "deep search" tool, but actually, it is a time-intensive manual process, aided by small software tools. About 50,000 dissertations in medicine, dental medicine, veterinary medicine and biology have been downloaded and compared with each other, with some of the major plagiarisms thus discovered documented at the VroniPlag Wiki. Medicine was chosen for this investigation, as these theses tend to be quite short in Germany and many are available online.

Data collection

The first step was obtaining the dissertations from the various university libraries. One would think that this would be a trivial step, as most university libraries offer e-publication services to their members. It would seem that all one would need to do would be to download the files. But each university seems to have its own, intricate database and retrieval structure. An API would be wonderful that could be queried and would return a JSON map with relevant metadata such as name, title, field, year, and URL to the thesis. Indeed, there are a few libraries that offer such a service. Most just have some sort of web page for each dissertation that includes the metadata, but without markup indicating the semantic meaning of the text. Some libraries seem to make it intentionally difficult to automatically download all of the theses. With a little bit of work, the data needed can be automatically scraped from such pages, but the scraper needs to be adjusted for each library.

The most important data item for this task is the file name for the PDF. One library goes to the trouble of splitting every thesis into chapters, so there is not just one PDF but a directory containing all of the files. These have to be merged before continuing. Another library does not publish the file names, but only a key value used to generate the file name. However, if one downloads a few theses by hand, it is easy to see how to construct the thesis PDF name, if one has the correct key value, so that these theses, too, can be automatically downloaded.

The names of the files are at times quite amusing, as they appear to be named by the candidates themselves: "copyshop-fassung" [copyshop version], "dissertation_finish", or just "doktor". Most are called "dissertation" or "doktorarbeit", my favorite is "Microsoft_Word_-_DoktoarbeitAmAktuellsten" (misspelled "most recent doctoral thesis done with Word"). Apparently most of the libraries don't have a procedure for giving the files meaningful names. Sometimes the same thesis is offered under different names for unknown reasons. There are also universities that co-publish dissertations in their online libraries, so the same thesis will be available from two different universities under two different names.

Pre-processing

As is usual for data mining applications, one of the most time-consuming parts of the exercise is getting the data ready for work. A directory was set up for each of the 44 departments from various medical schools and life science departments chosen throughout Germany and Austria. The downloaded files were renamed to include the name of the university and the year published (if available).

The PDF files now needed to be converted into plain text in order to be compared. The free program pdf2txt, which can be run as a batch job, can be set up to automate this process. Around 10% of the dissertations in the collections downloaded could not be extracted with this tool. Some of the theses were locked, others had the text stored as images, some just produced garbage, so they had to be disregarded.

The result was about 9 GB of plaintext files.

Text crunching

Now with directories of appropriately named text files, the text crunching can begin. The pairwise comparison of the text files can be done with the sim_text algorithm, a powerful open-source text comparison tool developed by Dick Grune & Matty Huntjens1, originally as a tool for finding program code replication in large collections of program files. With the following command, all of the files in a directory can be compared with each other.
# -o Output to out.log
# -d use diff format for output
# -p use percentage format for output
# -t cutoff percentage is 1

# -r minimum run size is 7
# use all files ending in .txt


sim_text -o out.log -d -p -t 1 -r 7 *.txt
Or you can compare all the files in two directories d1 and d2 with themselves and each other:
sim_text -o out.log -d -p -t 1 -r 7 d1\*.txt d2\*.txt
The main point of using sim_text as given above is the use of the -p option. This suppresses the standard output from the algorithm, which consists of long lists of overlapping portions of text and their positions within the text. Instead, only an approximate percentage of the text overlap is printed out, as shown here:

Diss_371.txt consists for 31 % of Diss_45.txt material

The results are sorted by amount of overlap, so that largest overlapping pairs are shown first. However, one must understand that this is only an indication of a possible plagiarism. The two files now must be closely examined, manually. They could be joint work and note that fact in the theses themselves; it could be just the title pages that are on deposit, so of course there will be a lot of similarity between the files; the theses could be quite short, but there is a large overlap in the references used; or they could be copies of the same thesis, just with different file names or from different library servers. These are false positives, and there are many of them. Filtering out the false positives is time-intensive and can only be done manually.

Even when two theses are found with large amounts of text overlap, there is still the question of which author copied and which one was copied from. If the theses are a number of years apart, it could be relatively clear, although it is a problem in Germany in medicine, as the doctoral thesis is often written in parallel with the studies, but can only be handed in when all the coursework has been completed. If they are in the same year, or both defended on the same day, then one cannot really say which one was copied.

Comparison

Once a pair of theses has been identified as candidates for further investigation, it is necessary to directly compare them.  sim_text can also be used for this step, as it also works as an "anti-diff-tool", one that quickly highlights the identical portions of two files so that the differences are very easy to see. VroniPlag Wiki has implemented the algorithm in JavaScript so that it can be run locally (and offline!) in any browser with JavaScript enabled.

One text file is copied into the left hand box, one into the right hand box. The drop-down list is the minimum number of identical words in a run list to be colored, the default is 4. When the button "Texte vergleichen!" (compare texts) is pressed, text which is the same on both sides is colored with the same color on each side, changing colors only when the exact text match terminates. The algorithm has been adapted to ignore punctuation, super- and subscripts, and special characters when matching. 

Output of JavaScript implementation of sim_text with text coloring
If a thesis turns up with many colored parts, it then needs to be fragmented and double-checked – manually, by the researchers at VroniPlag Wiki who have developed a good way of documenting and categorizing text overlap that constitutes plagiarism.

The big picture

Examining thousands of dissertations in this manner can quickly get confusing. In order to be able to see the big picture, for example, to determine if there are groups of theses with common text in one department or other patterns, a graphical representation of the sim_text output can be created. This makes it relatively simple to plot the similarities between dissertations as a collection of graphs.

A simple Python script can be run on the output produced by sim_text in order to create input for Graphviz. Graphviz is a free and open graph-drawing program that takes a standardized input form as a text file and produces graphical output.  Now clusters of overlapping theses (sadly, often with the same supervisor) just pop out visually. The similarities are sorted by degree of overlap, but they still need to be closely examined as above, as many false positives are generated.

Text overlap in dissertations from one faculty at one university

It is not possible to create a graph for all 50,000 dissertations, especially as there are many theses with small amounts of overlap that would just clutter up the graph. Even just for one department there can be very many small commonalities. For example, in one department with 534 dissertations, there were 340 overlaps reported by sim_text with run length of 7, but only 60 of these consisted of more than 5% of the document.

Small amounts of overlap from different theses can, however, add up to quite a substantial amount. The complete output of sim_text for two or more institutions can be loaded into a spreadsheet and then sorted by various criteria. For example, since the data preparation step included a name for the department, cross-cluster text overlap can be isolated and identified. The data can also be sorted by name of the file. If there are a number of text parallels in one thesis from a number of different other ones, the file should be more closely examined. During one investigation in which the theses from the University of Vienna were compared with all of the theses from the other universities, one thesis was identified, Ves, that used portions from quite a number of other dissertations. It turned out that on at least 57% of the pages there was text overlap from dissertations from other universities. 

Time constraints

Since the algorithm compares each thesis pairwise with all of the others, the number of comparisons for such a data set grows quadratically with the number of texts examined. In a first investigation on a simple dual-core laptop with 4 GB main memory, comparing 1,000 theses needed only a few minutes. When about 24,000 were tried on the same machine, the system eventually crashed after 3 days of computation unfortunately without outputting any useful results.

Using a faster computer with 8 GB of main memory, 3,000 theses were compared in 18 minutes, 7,000 needed almost 2 hours. An attempt to compare 29,000 files ran for a bit more than 2 hours before crashing without results.

Since it was possible to compare each departmental cluster with each of the others, it was decided to set up such a sequence of pairwise cluster comparisons. With 44 departments, this meant 946 cluster comparisons (44*43/2) needed to be run, each taking between 10 minutes and 3 hours, depending on the size of the clusters. In all, at least 1.25 billion individual comparisons of pairs of dissertations needed to be made.

It was determined that using quad-core computers it was possible to run sim_text in parallel on each core without interference. So four machines were set up in order to have 16 processes running at the same time. It took about 20 minutes to load the 9 GB of text data locally onto each machine (accessing the network drive would have slowed down the investigation tremendously). A batch file was generated with all the 946 cluster comparisons and then simply split into 16 files. One was loaded onto each core, and the processes started chugging away. 40 hours later, much earlier than expected, the processes had all finished without crashing!

Of course, the logfiles now included an excessive number of duplicates, as the intra-cluster comparisons had been repeated 43 times, bringing the number of individual comparisons of one dissertation to another up to around 50 billion. So the duplicates had to be eliminated before looking at the data. 

Additional investigations
 
Attempts to run sim_text on a Mac computer or under Linux turned up an interesting anomaly. Calculations that run for about 5 minutes on a PC will take just under 2 hours on a Mac, and may never terminate on a Linux. It is not clear why this is so.

The program sim_text also has an option for only comparing new files with "old" ones, that is, those that have already been checked against each other.
sim_text.exe -o output.log -d -p -t 1 -r 7 newdir/*.txt / olddir1/*.txt olddir2/*txt
This command should only compare files in newdir with those in olddir1 and olddir2, not the files in olddir1 with olddir2 and each of those with themselves. However, tests with this option were not conclusive, as somewhat different results, including overlap reported where there actually was none at all, were obtained using this option as opposed to a full comparison. Theoretically, this is what would be needed in order to set up a system for comparing a newly submitted thesis with all of the older ones from the same department. This needs looking into to see why it does not work.

Future work will be seeing if adding more main memory can speed up the process, and trying to work out an alternative algorithm for a Hadoop-based supercomputer.

What have we seen?

At the beginning of the investigation, we suspected that there would be a few theses that used material from other universities. It seems ludicrous that people would actually take some text without attribution, or even entire dissertations from other people from the same university or even the same supervisor and submit it as their own.

We were wrong.

The detailed analysis of Münster and the Charité has to date uncovered three theses that are completely (100% of the pages) taken from other dissertations. Scores of others have used text without reference from others in their research group. There are chains and nets of text overlap that violate the principles of good scientific practice. And these are the only two clusters that have been looked at in detail up until now.

There are theses that no one can have read, or they would have seen the Wikipedia links underlined and embedded in the PDF or the disastrous formatting and layout problems. Or found the large amount of text overlap with the supervisor's own habilitation.

So there is both plagiarism within the faculty and plagiarism from other universities, plagiarism from Internet sources in general and from the Wikipedia in particular.

Why do they do that? I've had one person whose dissertation has been documented on VroniPlag Wiki call and tell me that his supervisor told him to write it like that. There were a few laminated pages attached to the machine he was using for his research, they were told to put that verbatim in their thesis. Do they not realize that they are publishing a scientific document with their names attached that is readable by everyone in the world? Anyone can compare this thesis with other published ones and ask: Why can't they refer to the source?

In conclusion, this investigation was not the result of applying any sort of magic software that ruminated and spat out the offending theses. There was no research money needed, just some free and open software, many dissertations published in Open Access, some university computers otherwise idle over weekends, and some researchers with a good bit of time.

For the universities in question, the conclusion must be: Start reading your dissertations carefully, especially before they are published online! Don't expect to solve the problem quickly by purchasing expensive software, that won't help. Software can only be a tool, and it does not catch everything automatically, and there are some systems out there that are little more than snake oil. Do note that all of these plagiarism cases are not singularities, individual persons who have cheated. The amount of plagiarism found to date points to a systemic problem within the universities which must be solved, the quicker the better. 
 ---------------------------
 1 Dick Grune writes in January 2015: "[Y]ou write that sim_text was developed by Matty Huntjens and me, but that is not correct. I had the idea and wrote the code for comparing C program files. I then extended the code to handle Pascal programs (this was 1986 or so). Matty Huntjens, who was in charge of the C and Pascal programming workshops, wrote a bunch of (Unix) shell scripts to mass-compare workshop hand-ins from several years back, with overwhelming results. Matty and me then (1989) wrote a short paper on these shell scripts and their use."

Friday, August 29, 2014

Google censors link

Well, what does the morning's email bring? A letter from Google:

Notice of removal from Google Search

Due to a request under data protection law in Europe, we are no longer able to show one or more pages from your site in our search results in response to some search queries for names or other personal identifiers. Only results on European versions of Google are affected. No action is required from you.
These pages have not been blocked entirely from our search results, and will continue to appear for queries other than those specified by individuals in the European data protection law requests we have honored. Unfortunately, due to individual privacy concerns, we are not able to disclose which queries have been affected.
Please note that in many cases, the affected queries do not relate to the name of any person mentioned prominently on the page. For example, in some cases, the name may appear only in a comment section.
The following URLs have been affected by this action:
http://copy-shake-paste.blogspot.de/p/vroniplagwiki-scorecard.html
All right, that means that one of the following 36 persons who have either a dissertation, a habilitation or a textbook published under their own name have extensive text parallels with other works that are generally considered to be plagiarism, even if the university in question has not decided to withdraw the degrees:
Karl-Theodor zu Guttenberg, Veronika Saß, Matthias Pröfrock, Silvana Koch-Mehrin, Georgios Chatzimarkakis, Bijan Djir-Sarai, Uwe Brinkmann, Margarita Mathiopoulos, Siegfried Haller, Jürgen Goldschmidt, Cornelia Eva Scott, Arne Heller, Martin Winkels, Daniel Volk, Ulf Teichgräber, Patrick Ernst Sensburg, Nalan Kayhan, Andreas Wolfgang Bonz, Michael Heun, Loukas A. Mistelis, Asso Omer Saiwani, Arne Herting, Nasrullah Memon, Bernhard Fischel, Bernd Holznagel, Pascal Schumacher, Thorsten Ricke, Jesu-Paul Manikonda, Rodrigo Herrera, Mareike Bonnekoh, Christian Huber, Ruth Angela Wernsmann, Qiang Fang, Dariusz Malan, Tristan Nguyen, or Alexandros Philippos Anastasiadis
It's called the Streisand effect, people.

You published something that contains unexplained text parallels. These text parallels have been documented publicly, in a review. Explain them in public. That's what we do in academia, we discuss and exchange arguments publicly.

Or as the yearly conference of German language and literature scholars put it in 1967, when they were protesting the decision of the University of Bonn to not rescind the doctorate of Pater Udo Nix which contained extensive plagiarism:
Die auf der Bochumer Tagung versammelten Hochschulgermanisten halten es für ihre Pflicht, sich von dieser an der Universität Bonn getroffenen Entscheidung nachdrücklich zu distanzieren. [. . . ] Wenn eine Rezension in einer unserer Fachzeitschriften gegen eine wissenschaftliche Veröffentlichung den Vorwurf des Plagiats erhebt, hat es als selbstverständlich zu gelten, daß diejenigen, die ein solcher Vorwurf trifft, in angemessener Weise dazu Stellung nehmen. Versuchen die Betroffenen, die Angelegenheit durch bloßes Stillschweigen zu erledigen, und bleibt dieses Verhalten auch noch ungerügt, so muß man fragen, was unser Rezensionswesen eigentlich noch wert sei und bis zu welchem Grade die Regeln wissenschaftlichen Anstands denn außer acht gesetzt werden dürfen. [. . . ] Angesichts einer solchen Häufung von Entlehnungen, wie sie im Falle Nix festzustellen ist, kann weder die Erklärung befriedigen, daß vorsätzliche Täuschung nicht eindeutig nachweisbar und daher bloße Fahrlässigkeit zu unterstellen sei, noch die Behauptung, daß die plagiierten Stellen für die Beurteilung der wissenschaftlichen Leistung irrelevant blieben. Auch wenn sie zuträfen, höben beide Feststellungen den Tatbestand nicht auf, daß die oben genannte wesentliche Voraussetzung für die Verleihung des Doktorgrades irrigerweise als gegeben angenommen wurde. Es wäre schlechthin verderblich, wenn in solchen Fällen die gesetzlichen Vorschriften in einer Weise ausgelegt würden, welche eben diejenigen Grundlagen wissenschaftlicher Forschung und Publikation bedroht, deren Sicherung die gesetzlichen Vorschriften zu dienen haben.
[Moser, H. (1968). Notiz. In: Zeitschrift f. dt. Philologie, Vol. 87, No. 1, pp. 312–316]

The scholars of German Letters gathered at the conference in Bochum feel that it is their duty to distance themselves from the decision reached by the University of Bonn. [. . . ] When a review of an academic paper is published in one of our academic periodicals and contains the accusation of plagiarism, it is taken for granted that the person such accused must respond in an appropriate manner. If the person in question tries to solve the matter by remaining silent, and if this behavior is not condemned, then one must ask oneself of what worth our system of reviews actually is and to what degree the rules of good academic conduct may be set aside. [. . . ] In the face of the sheer amount of borrowed material that can be determined in the case of Nix, it is not satisfactory to declare that it is impossible to prove beyond a shadow of doubt that the deception was not done with malice aforethought and thus only an accusation of negligence remains. It is also not satisfactory to assert that the plagiarized passages are irrelevant for the determination of the academic content. Even if this were so – it would not change in the least the fact that the above mentioned preconditions for granting a doctoral degree were erroneously assumed to have existed. It would be ruinous if in such cases the legalities were to be interpreted in such a manner as to threaten the exact same basic tenets of academic research and publication that they purport to uphold.[Selection and translation from my book False Feathers, p. 50]


Wikipedia by any other name

Back in May I reported on the the uproar surrounding the assertion that a book published by C. H. Beck in Germany, Grosse Seeschlachten -- Wendepunkte der Weltgeschichte von Salamis bis Skagerrak, contained plagiarism from the Wikipedia. The publisher withdrew the book, although "only" 5% of the book was affected, they stated. Well, there is actually quite a bit, and although the Wikipedia texts have been patchwritten (words inserted or deleted, words swapped with synonyms, phrases reordered) so they are not completely identical, it is clear that the text closely follows the Wikipedia.  Some of the fragments have been documented by a VroniPlag Wiki researcher, however they have not yet been double-checked [volunteers are welcome!]:
A representative of the publisher has agreed to participate in a discussion about the use of the Wikipedia by researchers on October 3, 2014 at the WikiCon in Cologne.

The next German publication with heavy borrowing from the Wikipedia was published by Springer Vieweg, Geschichte der Rechenautomaten, the history of computing in three volumes by a retired German computer science professor. Anyone who has given a lecture on the history of computing recognizes that many of the pictures are taken from the Wikipedia and other Internet pages, and many are not in the public domain. But it turns out that a good bit of the text is also from the Wikipedia.

I don't normally link to the FAZ, but they published an excellent article on the problem by Eleonor Benítez. She quotes the author as stating that these volumes are not scientific writing, but reference books. He defines a reference book as 80% data, while scientific writing contains didactical editing and thus contains more intellectual property. Data, he continues, are facts and not copyrightable. And anyway, there are only so many ways to state something in German.

Again, a VroniPlag Wiki researcher has documented just a few pages that have not yet been double-checked, but there are some very long passages that are identical:

Springer has withdrawn the books from their home page, but the books are still easily obtainable through other booksellers. I asked the executive editor if they were going to put out a press release about the issue, he said no. It seems it is hoped that this will quietly die down.

And now a third German book using Wikipedia without attribution has been identified. The Wagenbach Verlag recently published Aldo Manuzio. Vom Drucken und Verbreiten schöner Bücher, a scathing review in artmagazine pointing out the copying was published in July 2014.

A few questions arise:
  • Why do academic authors use the Wikipedia in their work without respecting the CC-BY-SA license? Okay, they probably find it embarrassing to have Wikipedia references all over the place. But isn't it worse to be found out after the book is in print?
  • Why don't the publishers have editors read the books critically before they are published? The prices are high enough, and that is supposed to be the justification for the price, that the publishers are somehow adding value to the process by ensuring a high-quality product. If the publishers are trying to save money by cutting out the editors, then perhaps we don't need publishers any more. 
  • Do the universities where the book authors work get rewarded financially by their ministries of education for these "publications"? Some are still listed on the publication lists of the authors, even though they have been withdrawn.  This is also often the case for retracted papers, they remain on the lists of publications for which one assumes the university and perhaps the researcher obtained a reward, even after retraction. 
  • I've asked the German Wikimedia e.V. if they cannot sue in the name of the collective authors for the Wikipedia articles. However, only the authors themselves would be able to sue over copyright misuse. I still think, though, that since the license is not being respected by the publishers (especially if pictures are being used), that a suit or two should be in order.
  • Above all: if researchers are publishing Wikipedia material under their own names, how can I explain to my students that it is not acceptable for them to do the same?
I'm sure there will be more to come. 

Thursday, August 28, 2014

Swedish scholar to be disciplined for plagiarism

Retraction Watch noted in March 2014 that a 2012 paper by a Swedish scholar from Linnaeus University in Växjö had been retracted for plagiarism. A recent commentator on the article noted that the university actually investigated the case and determined that he was guilty of plagiarism. They put out a press release stating that plagiarism is a serious matter and that the scholar has been suspended from the university, pending a decision on the part of the personnel department about the extent of sanctions to be meted out.

The right-wing online press in Sweden, which is gaining much momentum in the current election year, posted the name of the researcher in question, making sure to comment that he was a "leftist" researcher investigating problems of racism, as if that somehow had something to do with the cases (I'm not linking to the publication in question).

It is quite disconcerting to have an academic discussion about good scientific conduct and plagiarism dragged into a political fight. This has also happened in Germany, where the media only seem to report on cases of plagiarism if they involve politicians. Many universities in Germany and Austria drag their feet when investigating allegations of plagiarism, and answer, as one did today, stating that on the grounds of official secrecy and data privacy no information about administrative processes will be published. It is important that we speak about academic plagiarism cases in the open, but we must be focused on the plagiarism itself and not on other details about the person in question.