Friday, August 29, 2014

Wikipedia by any other name

Back in May I reported on the the uproar surrounding the assertion that a book published by C. H. Beck in Germany, Grosse Seeschlachten -- Wendepunkte der Weltgeschichte von Salamis bis Skagerrak, contained plagiarism from the Wikipedia. The publisher withdrew the book, although "only" 5% of the book was affected, they stated. Well, there is actually quite a bit, and although the Wikipedia texts have been patchwritten (words inserted or deleted, words swapped with synonyms, phrases reordered) so they are not completely identical, it is clear that the text closely follows the Wikipedia.  Some of the fragments have been documented by a VroniPlag Wiki researcher, however they have not yet been double-checked [volunteers are welcome!]:
A representative of the publisher has agreed to participate in a discussion about the use of the Wikipedia by researchers on October 3, 2014 at the WikiCon in Cologne.

The next German publication with heavy borrowing from the Wikipedia was published by Springer Vieweg, Geschichte der Rechenautomaten, the history of computing in three volumes by a retired German computer science professor. Anyone who has given a lecture on the history of computing recognizes that many of the pictures are taken from the Wikipedia and other Internet pages, and many are not in the public domain. But it turns out that a good bit of the text is also from the Wikipedia.

I don't normally link to the FAZ, but they published an excellent article on the problem by Eleonor Benítez. She quotes the author as stating that these volumes are not scientific writing, but reference books. He defines a reference book as 80% data, while scientific writing contains didactical editing and thus contains more intellectual property. Data, he continues, are facts and not copyrightable. And anyway, there are only so many ways to state something in German.

Again, a VroniPlag Wiki researcher has documented just a few pages that have not yet been double-checked, but there are some very long passages that are identical:

Springer has withdrawn the books from their home page, but the books are still easily obtainable through other booksellers. I asked the executive editor if they were going to put out a press release about the issue, he said no. It seems it is hoped that this will quietly die down.

And now a third German book using Wikipedia without attribution has been identified. The Wagenbach Verlag recently published Aldo Manuzio. Vom Drucken und Verbreiten schöner Bücher, a scathing review in artmagazine pointing out the copying was published in July 2014.

A few questions arise:
  • Why do academic authors use the Wikipedia in their work without respecting the CC-BY-SA license? Okay, they probably find it embarrassing to have Wikipedia references all over the place. But isn't it worse to be found out after the book is in print?
  • Why don't the publishers have editors read the books critically before they are published? The prices are high enough, and that is supposed to be the justification for the price, that the publishers are somehow adding value to the process by ensuring a high-quality product. If the publishers are trying to save money by cutting out the editors, then perhaps we don't need publishers any more. 
  • Do the universities where the book authors work get rewarded financially by their ministries of education for these "publications"? Some are still listed on the publication lists of the authors, even though they have been withdrawn.  This is also often the case for retracted papers, they remain on the lists of publications for which one assumes the university and perhaps the researcher obtained a reward, even after retraction. 
  • I've asked the German Wikimedia e.V. if they cannot sue in the name of the collective authors for the Wikipedia articles. However, only the authors themselves would be able to sue over copyright misuse. I still think, though, that since the license is not being respected by the publishers (especially if pictures are being used), that a suit or two should be in order.
  • Above all: if researchers are publishing Wikipedia material under their own names, how can I explain to my students that it is not acceptable for them to do the same?
I'm sure there will be more to come. 

3 comments:

  1. The problem with Wikipedia (without an article) is more complicated, since it is quite feasible that plagiarism goes both ways – which tends to suggest foul play on the part of an individual author rather than a Wikipedia editor. Certainly, such a suggestion is much more damaging to a scholar than it is to Wikipedia. It requires quite a bit of effort to make sure who wrote first, especially when an author circulated pieces of text that made their way into Wikipedia *before the book was published* (I've seen it happen). That is probably not the case here, but extreme caution is due before passing judgment, because it can do a lot of damage.

    ReplyDelete
  2. That is certainly an important question and one that VroniPlag Wiki looks into closely when double-checking fragments. There are tools such as WikiBlame http://wikipedia.ramselehof.de/wikiblame.php that can be used to determine when a particular sentence was introduced into a Wikipedia (and by whom), since all of the revisions are available online. This way it can be seen that the Wikipedia article was available in 2012, the book published in 2014.

    ReplyDelete
  3. Yes, in this case it looks pretty clear-cut. But the Wikiblame tool is anything but reliable, unfortunately, albeit luckily it mainly errs on the conservative side.

    ReplyDelete