Tuesday, October 13, 2020

Plagiarism Detection Software: Publication, Mergers, News

Finally found some time for a post!

First off: The TeSToP working group (of which I am a participant) at the European Network for Academic Integrity has finally published its test of support tools for plagiarism detection. It looks at the results from various angles such as effectiveness on various European languages, one source or multi-source plagiarism, and amount of rewriting done.

Foltýnek, T., Dlabolová, D., Anohina-Naumeca, A. et al. Testing of support tools for plagiarism detection. Int J Educ Technol High Educ 17, 46 (2020). https://doi.org/10.1186/s41239-020-00192-4

Abstract:
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

So just a few months later these two press releases show up:

  • Turnitin announced in June 2020 that they have purchased the company Unicheck. Both systems participated in the TeSToP test.  
  • Urkund and PlagScan, two more systems that were in the TeSToP test, announced a merger in September 2020: They will now be known as Ouriginal, and will be combining the plagiarism detection results of Urkund with the author metrics of PlagScan. 

These four systems just happened to be the best ones in combined coverage and usability, although none of the systems are perfect, averaging 2.5 ± 0.3 on a scale of 0 to 5. We plan on retesting in 3 years, so it will be very interesting to see how these combined systems fare then.

In other news, the proceedings of the "Plagiarism Across Europe and Beyond 2020" (PAEB2020) that ended up being held online instead of Dubai is now ready and available for download. PAEB2021 will be held in Vienna, September 22-24, 2021, COVID-19 permitting.

And in very sad news, academic integrity researcher Tracey Bretag from Australia passed away in October 2020. Jonathan Bailey has written an excellent obituary on his blog Plagiarism Today. I am glad that I was able to meet her many times and experience her great ideas and energy. It was a pleasure to contribute to her Handbook of Academic Integrity. She will be sorely missed.

Saturday, June 27, 2020

New Brazilian Minister - of Education - sported a false doctorate

Brazilian and Argentine press is awash with reports on Bolsonaro's new Minister of Education, Carlos Decotelli da Silva ([1] - [2] - [3] - [4] for just a few). It seems that the CV that Bolsonaro presented to the press rather exaggerated in at least one item: the doctorate.

Bolsonaro stated that "Professor" Decotelli held a doctorate from the Argentine University of Rosario. The rector of the University of Rosario, Franco Bartolacci, tweeted that he wanted to make it clear that Decotelli did not have a doctorate from @unroficial. According to Folhapress, the thesis that was presented by Decotelli was assessed negatively by the dissertation committee.

Decotelli then said that he did begin a doctorate program, but didn't actually finish. He has now corrected this portion of his CV.

Minister of Education.

Monday, March 30, 2020

Bored? How about documenting plagiarism?

So you are all stuck at home with the Corona virus and have already binge-watched 15 series. How about contributing to cleaning up the academic world? Not all of us have the biomedical chops to debunk a supposed cure, like Elisabeth Bik writing in her Science Integrity Digest: Thoughts on the Gautret et al. paper about Hydroxychloroquine and Azithromycin treatment of COVID-19 infections.

How about some plagiarism documentation? The German platform VroniPlag Wiki that I have been working with since 2011 has so many unfinished cases and I know, the platform tends to be in German. The most recent documentation is in English: A recent dissertation (2017) from the Humboldt University of Berlin, Ids. From the executive summary:
The investigation has documented extensive plagiarism in the thesis. Over 90% of the pages of the main text contain plagiarized passages. Over two-thirds of the main text is taken almost verbatim from other sources, generally without any or the proper reference. The passages are taken from around 100 mostly online sources. Among these sources are the Wikipedia, a doctoral dissertation available online, a master's thesis, some organizational home pages, many open access publications, and various online religious reference works. The published PDF of the dissertation contains many copy-and-paste artefacts such as numerous hidden (embedded) web links that are also found as visible links in the source material. In conclusion, the dissertation could be categorized as an outright collage of easily obtained and quite diverse sources.
Drop in to the weekly chat Mondays at 21:00 MESZ (UTC +2), we'll be glad to help you get started. No specialized knowledge necessary, we'll be glad to show you the ropes, and there are plenty of English-language cases still unfinished.

Monday, February 24, 2020

Testing of Support Tools for Plagiarism Detection

It's out! Our pre-print about testing support tools for plagiarism detection, often mistakenly called plagiarism-detection tools. The European Network of Academic Integrity Working Group TeSToP worked in 2018 and 2019 to test 15 software systems in eight different languages. Of course, everything has changed since then, the software people let us know, but whatever: here's the pre-print, we have submitted to a journal.

arXiv:2002.04279 [cs.DL]

Testing of Support Tools for Plagiarism Detection

 
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic. 

Thursday, January 9, 2020

Predatory Publishing 2020

It's 2020 and I'm still bogged down, not finished with my notes from half a year ago on the ENAI conference. What can I say? Life and all....

So let's start the new year with a discussion on predatory publishers. Deborah Poff gave a keynote speech at the ENAI conference 2019 on the topic, and as COPE chair she has now published a discussion paper on the topic. There are a number of irritating points, as Elisabeth Bik points out in a Twitter thread, but on the whole this is a good paper to get this very important discussion going in the new year.

How can we tell whether or not a journal is legitimate or not? Legitimate in the sense that rigorous peer-review is not just stated, but actually done? We are in a current world situation in which certain groups attack science because it is informing us of uncomfortable truths. Predatory publishers offer a welcome point of attack, as the weaknesses of the "science" they publish are immediately assumed for all science. The "self-regulation" of science has been shown in recent years to not actually do the work it is supposed to do, despite the efforts of so many to point out issues that need attention.

Researchers need guidance about publication venues. Beall's list was taken down for legal reasons, but there is a web site that publishes an archived copy of the list that was taken on 15 January 2017. That was soon after the 2017 list was published.

There is a checklist available at thinkchecksubmit.org that is useful, but not a list of problematic publications, probably for legal reasons.

We can't keep putting out heads in the sand about the problems of academic misconduct. If we only look away, we let people get away with bad science, and that then reflects on us all.