### Plagiarism Detection Software: Publication, Mergers, News

First off: The TeSToP working group (of which I am a participant) at the European Network for Academic Integrity has finally published its test of support tools for plagiarism detection. It looks at the results from various angles such as effectiveness on various European languages, one source or multi-source plagiarism, and amount of rewriting done.

Foltýnek, T., Dlabolová, D., Anohina-Naumeca, A. et al. Testing of support tools for plagiarism detection. Int J Educ Technol High Educ 17, 46 (2020). https://doi.org/10.1186/s41239-020-00192-4

Abstract:
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

So just a few months later these two press releases show up:

• Turnitin announced in June 2020 that they have purchased the company Unicheck. Both systems participated in the TeSToP test.
• Urkund and PlagScan, two more systems that were in the TeSToP test, announced a merger in September 2020: They will now be known as Ouriginal, and will be combining the plagiarism detection results of Urkund with the author metrics of PlagScan.

These four systems just happened to be the best ones in combined coverage and usability, although none of the systems are perfect, averaging 2.5 ± 0.3 on a scale of 0 to 5. We plan on retesting in 3 years, so it will be very interesting to see how these combined systems fare then.

In other news, the proceedings of the "Plagiarism Across Europe and Beyond 2020" (PAEB2020) that ended up being held online instead of Dubai is now ready and available for download. PAEB2021 will be held in Vienna, September 22-24, 2021, COVID-19 permitting.

And in very sad news, academic integrity researcher Tracey Bretag from Australia passed away in October 2020. Jonathan Bailey has written an excellent obituary on his blog Plagiarism Today. I am glad that I was able to meet her many times and experience her great ideas and energy. It was a pleasure to contribute to her Handbook of Academic Integrity. She will be sorely missed.

### New Brazilian Minister - of Education - sported a false doctorate

Brazilian and Argentine press is awash with reports on Bolsonaro's new Minister of Education, Carlos Decotelli da Silva ([1] - [2] - [3] - [4] for just a few). It seems that the CV that Bolsonaro presented to the press rather exaggerated in at least one item: the doctorate.

Bolsonaro stated that "Professor" Decotelli held a doctorate from the Argentine University of Rosario. The rector of the University of Rosario, Franco Bartolacci, tweeted that he wanted to make it clear that Decotelli did not have a doctorate from @unroficial. According to Folhapress, the thesis that was presented by Decotelli was assessed negatively by the dissertation committee.

Decotelli then said that he did begin a doctorate program, but didn't actually finish. He has now corrected this portion of his CV.

Minister of Education.

### Bored? How about documenting plagiarism?

So you are all stuck at home with the Corona virus and have already binge-watched 15 series. How about contributing to cleaning up the academic world? Not all of us have the biomedical chops to debunk a supposed cure, like Elisabeth Bik writing in her Science Integrity Digest: Thoughts on the Gautret et al. paper about Hydroxychloroquine and Azithromycin treatment of COVID-19 infections.

How about some plagiarism documentation? The German platform VroniPlag Wiki that I have been working with since 2011 has so many unfinished cases and I know, the platform tends to be in German. The most recent documentation is in English: A recent dissertation (2017) from the Humboldt University of Berlin, Ids. From the executive summary:
The investigation has documented extensive plagiarism in the thesis. Over 90% of the pages of the main text contain plagiarized passages. Over two-thirds of the main text is taken almost verbatim from other sources, generally without any or the proper reference. The passages are taken from around 100 mostly online sources. Among these sources are the Wikipedia, a doctoral dissertation available online, a master's thesis, some organizational home pages, many open access publications, and various online religious reference works. The published PDF of the dissertation contains many copy-and-paste artefacts such as numerous hidden (embedded) web links that are also found as visible links in the source material. In conclusion, the dissertation could be categorized as an outright collage of easily obtained and quite diverse sources.
Drop in to the weekly chat Mondays at 21:00 MESZ (UTC +2), we'll be glad to help you get started. No specialized knowledge necessary, we'll be glad to show you the ropes, and there are plenty of English-language cases still unfinished.

### Testing of Support Tools for Plagiarism Detection

It's out! Our pre-print about testing support tools for plagiarism detection, often mistakenly called plagiarism-detection tools. The European Network of Academic Integrity Working Group TeSToP worked in 2018 and 2019 to test 15 software systems in eight different languages. Of course, everything has changed since then, the software people let us know, but whatever: here's the pre-print, we have submitted to a journal.

arXiv:2002.04279 [cs.DL]

#### Testing of Support Tools for Plagiarism Detection

There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

### Predatory Publishing 2020

So let's start the new year with a discussion on predatory publishers. Deborah Poff gave a keynote speech at the ENAI conference 2019 on the topic, and as COPE chair she has now published a discussion paper on the topic. There are a number of irritating points, as Elisabeth Bik points out in a Twitter thread, but on the whole this is a good paper to get this very important discussion going in the new year.

How can we tell whether or not a journal is legitimate or not? Legitimate in the sense that rigorous peer-review is not just stated, but actually done? We are in a current world situation in which certain groups attack science because it is informing us of uncomfortable truths. Predatory publishers offer a welcome point of attack, as the weaknesses of the "science" they publish are immediately assumed for all science. The "self-regulation" of science has been shown in recent years to not actually do the work it is supposed to do, despite the efforts of so many to point out issues that need attention.

Researchers need guidance about publication venues. Beall's list was taken down for legal reasons, but there is a web site that publishes an archived copy of the list that was taken on 15 January 2017. That was soon after the 2017 list was published.

There is a checklist available at thinkchecksubmit.org that is useful, but not a list of problematic publications, probably for legal reasons.

We can't keep putting out heads in the sand about the problems of academic misconduct. If we only look away, we let people get away with bad science, and that then reflects on us all.

### ENAI2019 - Day 1

The European Network for Academic Integrity (ENAI) held its 2019 conference in Vilnius, Lithuania. I was presenting about the work of VroniPlag Wiki and the preliminary results of ENAI's test of software support tools for plagiarism detection. I've taken notes, so I will try and at least get summaries of what people are doing online.

Deborah C. Poff, the current chair of COPE, aretired professor for business ethics, and a former dean at the University of Northern British Columbia, Canada, opened the conference with a keynote on "The role of research integrity and publication ethics in university education for the 21st century." She started off speaking about the purpose of a university. It may once have been instilling in the students the values of truth, honesty, competency, hard-work, & study, but over they years this has changed. Many are focused on getting a good job, and they are paying very much money (in North America) for their education. She notes that as many as 50 % of students in the US say that they are disengaged from their studies. At universities administration has risen in importance as education and scholarship have declined. There is a discernable shift from a student-focused institution to a parents-as-consumers, global "excellence" mindset.

She noted that students don't understand the serious nature of plagiarism violations and told the story from her university of twins who plagiarized something from the Harvard Business Review. When confronted, they became angry and aggressive, and threatened with their father, a lawyer. He threatened to sue the faculty member reporting the plagiarism and the entire university.

She is currently putting together a book on Corporate Social Responsibility in the university, that is, helping universities to understand how to be responsible in the area of academic integrity.

In the session on "Addressing contract cheating (including legal practices)" I attended the talk by Wendy Sutherland-Smith & Kevin Dullaghan on "You don’t always get what you pay for: A user’s experiences of engaging with contract cheating websites."

They actually went out and purchased (!) ghostwriter work to evaluate the quality. They bought 54 assignments of between 825 and 2000 words from 18 sites that students use in various fields. They tried out both standard and premium quality work in order to look at cost vs. quality. The results were all over the map! The cost was between 50 and 300 Australian dollars, but 52 % of the work purchased failed to meet a passing grade in that subject. A ull 15 % of the papers were so unsatisfactory that they had to ask for them to be revised. But as of July 2018 they were still waiting for one company to respond. Premium was not much better than standard, and some failed to deliver on time.

They also looked at the privacy policies published on the web sites. Students should note that the companies have their identities and some threaten to publish the names of students who used the services if they stop payment.

The hardest part of the research was obtaining ethical board approval. Some felt that they were just supporting the industry, but they were able to convince the board that it is important to test something, not just guess how it is. They guarded the privacy of the students participating in the effort, if passport copies were needed or only credit cards accepted, they stopped the test. Only PayPal payments were accepted.

One interesting side effect that they discovered was that apparently ghostwriting companies are sending people to classes so that they are registered in the learning management systems and are able to send back to the company a list of emails of fellow students and a list of topics and dates of papers due. This permits the companies to send out targeted advertising to the students.

Thomas Lancaster followed Wendy with his talk on "Exploring low-cost contract cheating provision enabled through micro-outsourcing web sites" about trying to find out who exactly is providing the contract cheating services.

He noted that there is a demand for such work, a ready-made supply of labor, and that it is an established industry. There are even conferences being held for contract cheating writers. The salary for a full-time writer for a provider in Pakistan starts as $84 USD/month. The price for students connecting directly with writers, for example via sites such as Fiverr, is about$30 USD/1000 words.

Thomas did two studies, one in July 2016 and one in October 2018, contacting all of the writers he could find on Fiverr with the keywords "write essay". There were 93 providers in 2016, 197 in 2018. He noted that the advertised prices have gone down, although for \$6/1000 words chances are slim that one will get a good assignment.

This is why it is important to educate staff that contract cheating is NOT expensive, so it is important to develop assignments that cannot be turned around quickly.

Anna Krajewska spoke on "Attitudes to eradicating contact cheating and collusion amongst Widening Participation students in the UK: reflections from Foundation Year students at Bloomsbury Institute."

The "Foundation Year" is a bridging year before beginning university studies and is often taken by non-traditional and diverse students. That is, they are older, or may speak English as an additional language, or have young children, or are Black, Asian, or some other Minority Ethnic. They launched a campaign "Integrity matters!" and interviewed students on integrity. Most had a good understanding of cheating and collusion, but are often so afraid of writing in English that they resort to copying.

When asked what would help, they responded that they would ilke additional English-language classes, clearer instructions, additional workshops, stricter penalties, more frequent but smaller assignments, more information, exams and presentations instead of essay and reports, posters & videos, and a whistleblowing policy.

Penny Bealle, Prof. of Library Service, Suffolk County Community College, Riverhead, NY, offered "Need concise academic integrity lessons? Try these!" I was expecting either short e-learning lessons or 10-minute quick discussion topics, but this was more of multiple choice questions on a very basic level, and a video made in 2010.

Turnitin, as one of the sponsors of the conference, got to nominate a keynote speaker. Erica Flinspach  from the University of South Africa, spoke on "Encouraging originality & celebrating diversity on a mega scale: The UniSA story."

Her university has 400.000 distance learning students, 7000 staff members and 30.000 tutors. They have used Turnitin since 2008. They apparently now use it as a "teaching tool", although I am afraid that that rather encourages "re-sentencing" (a new term I learned at the conference for rewriting a sentence).  She notes that instructors must set an example when referencing in the study material they compile and respect the authorship of the student. The student's aim should not be to reduce the similarity index, but to give his or her own interpretation of the study/research done. Although certain percentages might be acceptable under certain conditions, blatant plagiarism is completely unacceptable regardless of similarity score. There is no acceptable similarity index, the evaluation is influenced by the purpose of the document, the expectation from the instructor, and the relevant subject field.

I asked her if the instructors are aware that all software systems produce false negatives, she only answered that instructors are taught to be alert to signs that a text is plagiarized.

Teddi Fishman chaired my session, I spoke about "Plagiarism in German doctoral dissertations – still a marginal issue 8 years after the Guttenberg case." I explained what VroniPlag Wiki does and that despite a bit of chatter on the part of the universites, not much has really changed.

The last talk of the day was by Anthony E. Gortzis on "Pathos for ethics, leadership and the quest for a sustainable future." He noted that problems arise from a lack of Business Ethics in corporate routine operations and a loose or even non-existent external audits & controls from the state / the stock exchange / other international organizations. He presented a Responsible Management Model with dimensions: Moral Culture, Moral Conduct, Communication and Regulations. From this social responsibility and corporate governance can arise. He lists the four "Whos", I find these good questions to ask in cases of plagiarism in doctoral dissertations:
• Who is Responsible? The person who was assigned to do the work.

• Who is Accountable? The person who makes the final decision and has the ultimate ownership.

• Who is Consulted? A person who must be consulted before a decision or action is taken.

• Who is Informed? The person who must be informed that a decision or action has been taken.
The day closed with the ENAI business meeting.

### Plagiarism around the world

I've just realized that I didn't get the promised ENAI posts done in June. I'll see if I can scratch something together. In the meantime, a few plagiarism links I've got saved in tabs:
• Plagiarism in work of departing Dean Dymph van den Boom
The University of Amsterdam reported in June 2019 that an interim dean's public address and parts of her thesis have been found to have been plagiarized.
• Kenyatta University Revokes Lecturers PhD For Cheating
A recent PhD grantee who was lecturing at Kenyatta University was found to have plagiarized the thesis of a Nigerian don. It appears that the don himself discovered the plagiarism.
• The Neue Zürcher Zeitung reports (in German) that the Serbian Minister of Finance is charged with plagiarism in his dissertation granted by the University of Belgrade. The university was reluctant to deal with the situation, but the plagiarism is apparently so clear that students have been protesting, insisting that the university take up a proper investigation and publish the secret report. The university has reluctantly agreed to a November 4, 2019 date of publication. The minister himself, the NZZ wryly notes, doesn't seem to care. He participated in the Berlin Marathon last week, putting down his name as "Dr. Mali".
• "Inspiration" or plagiarism? Journal du Geek reports (in French). Apparently, a French comedian is using copyright to take down video reports on what some say is plagiarism, but he insists is just inspiration or "the spirit of the times".
I gave a talk at the Leibniz Institute's PhD Network Day in Potsdam last week and spoke with a great bunch of PhDs about power hierarchies and academic misconduct. Two students from the Research Center Borstel told me that the institution has really gotten proactive about good academic conduct after the scandals there (see 1 - 2 - 3). They have orientation for new PhDs on good academic conduct, and insist on half-yearly reviews. They have a published plan, but I can only find it in German, their web site doesn't properly redirect to the translated pages.

Update: Just as I finished, another one dropped in by way of ENAI (European Network of Academic Integrity): Mr. Rinat Maratovich Iskakov has published a documentation that demonstrates that the dissertation of the Vice Minister of Education and Science of Kazakhstan is plagiarized The analysis is published on a Google Docs document. The first half of the document is the original and the second half is in English, translated by Ali Tahmazov. Apparently, the Polish plagiarism detection software StrikePlagiarism was used:
Анализ проверки диссертационной работы Жакыповой Ф.Н. на соискание ученой степени доктора экономических наук проведено с помощью системы StrikePlagiarism компании Plagiat.pl