Monday, March 30, 2020

Bored? How about documenting plagiarism?

So you are all stuck at home with the Corona virus and have already binge-watched 15 series. How about contributing to cleaning up the academic world? Not all of us have the biomedical chops to debunk a supposed cure, like Elisabeth Bik writing in her Science Integrity Digest: Thoughts on the Gautret et al. paper about Hydroxychloroquine and Azithromycin treatment of COVID-19 infections.

How about some plagiarism documentation? The German platform VroniPlag Wiki that I have been working with since 2011 has so many unfinished cases and I know, the platform tends to be in German. The most recent documentation is in English: A recent dissertation (2017) from the Humboldt University of Berlin, Ids. From the executive summary:
The investigation has documented extensive plagiarism in the thesis. Over 90% of the pages of the main text contain plagiarized passages. Over two-thirds of the main text is taken almost verbatim from other sources, generally without any or the proper reference. The passages are taken from around 100 mostly online sources. Among these sources are the Wikipedia, a doctoral dissertation available online, a master's thesis, some organizational home pages, many open access publications, and various online religious reference works. The published PDF of the dissertation contains many copy-and-paste artefacts such as numerous hidden (embedded) web links that are also found as visible links in the source material. In conclusion, the dissertation could be categorized as an outright collage of easily obtained and quite diverse sources.
Drop in to the weekly chat Mondays at 21:00 MESZ (UTC +2), we'll be glad to help you get started. No specialized knowledge necessary, we'll be glad to show you the ropes, and there are plenty of English-language cases still unfinished.

Monday, February 24, 2020

Testing of Support Tools for Plagiarism Detection

It's out! Our pre-print about testing support tools for plagiarism detection, often mistakenly called plagiarism-detection tools. The European Network of Academic Integrity Working Group TeSToP worked in 2018 and 2019 to test 15 software systems in eight different languages. Of course, everything has changed since then, the software people let us know, but whatever: here's the pre-print, we have submitted to a journal.

arXiv:2002.04279 [cs.DL]

Testing of Support Tools for Plagiarism Detection

 
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic. 

Thursday, January 9, 2020

Predatory Publishing 2020

It's 2020 and I'm still bogged down, not finished with my notes from half a year ago on the ENAI conference. What can I say? Life and all....

So let's start the new year with a discussion on predatory publishers. Deborah Poff gave a keynote speech at the ENAI conference 2019 on the topic, and as COPE chair she has now published a discussion paper on the topic. There are a number of irritating points, as Elisabeth Bik points out in a Twitter thread, but on the whole this is a good paper to get this very important discussion going in the new year.

How can we tell whether or not a journal is legitimate or not? Legitimate in the sense that rigorous peer-review is not just stated, but actually done? We are in a current world situation in which certain groups attack science because it is informing us of uncomfortable truths. Predatory publishers offer a welcome point of attack, as the weaknesses of the "science" they publish are immediately assumed for all science. The "self-regulation" of science has been shown in recent years to not actually do the work it is supposed to do, despite the efforts of so many to point out issues that need attention.

Researchers need guidance about publication venues. Beall's list was taken down for legal reasons, but there is a web site that publishes an archived copy of the list that was taken on 15 January 2017. That was soon after the 2017 list was published.

There is a checklist available at thinkchecksubmit.org that is useful, but not a list of problematic publications, probably for legal reasons.

We can't keep putting out heads in the sand about the problems of academic misconduct. If we only look away, we let people get away with bad science, and that then reflects on us all.