## Saturday, June 27, 2020

### New Brazilian Minister - of Education - sported a false doctorate

Brazilian and Argentine press is awash with reports on Bolsonaro's new Minister of Education, Carlos Decotelli da Silva ([1] - [2] - [3] - [4] for just a few). It seems that the CV that Bolsonaro presented to the press rather exaggerated in at least one item: the doctorate.

Bolsonaro stated that "Professor" Decotelli held a doctorate from the Argentine University of Rosario. The rector of the University of Rosario, Franco Bartolacci, tweeted that he wanted to make it clear that Decotelli did not have a doctorate from @unroficial. According to Folhapress, the thesis that was presented by Decotelli was assessed negatively by the dissertation committee.

Decotelli then said that he did begin a doctorate program, but didn't actually finish. He has now corrected this portion of his CV.

Minister of Education.

## Monday, March 30, 2020

### Bored? How about documenting plagiarism?

So you are all stuck at home with the Corona virus and have already binge-watched 15 series. How about contributing to cleaning up the academic world? Not all of us have the biomedical chops to debunk a supposed cure, like Elisabeth Bik writing in her Science Integrity Digest: Thoughts on the Gautret et al. paper about Hydroxychloroquine and Azithromycin treatment of COVID-19 infections.

How about some plagiarism documentation? The German platform VroniPlag Wiki that I have been working with since 2011 has so many unfinished cases and I know, the platform tends to be in German. The most recent documentation is in English: A recent dissertation (2017) from the Humboldt University of Berlin, Ids. From the executive summary:
The investigation has documented extensive plagiarism in the thesis. Over 90% of the pages of the main text contain plagiarized passages. Over two-thirds of the main text is taken almost verbatim from other sources, generally without any or the proper reference. The passages are taken from around 100 mostly online sources. Among these sources are the Wikipedia, a doctoral dissertation available online, a master's thesis, some organizational home pages, many open access publications, and various online religious reference works. The published PDF of the dissertation contains many copy-and-paste artefacts such as numerous hidden (embedded) web links that are also found as visible links in the source material. In conclusion, the dissertation could be categorized as an outright collage of easily obtained and quite diverse sources.
Drop in to the weekly chat Mondays at 21:00 MESZ (UTC +2), we'll be glad to help you get started. No specialized knowledge necessary, we'll be glad to show you the ropes, and there are plenty of English-language cases still unfinished.

## Monday, February 24, 2020

### Testing of Support Tools for Plagiarism Detection

It's out! Our pre-print about testing support tools for plagiarism detection, often mistakenly called plagiarism-detection tools. The European Network of Academic Integrity Working Group TeSToP worked in 2018 and 2019 to test 15 software systems in eight different languages. Of course, everything has changed since then, the software people let us know, but whatever: here's the pre-print, we have submitted to a journal.

arXiv:2002.04279 [cs.DL]

#### Testing of Support Tools for Plagiarism Detection

There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

## Thursday, January 9, 2020

### Predatory Publishing 2020

It's 2020 and I'm still bogged down, not finished with my notes from half a year ago on the ENAI conference. What can I say? Life and all....

So let's start the new year with a discussion on predatory publishers. Deborah Poff gave a keynote speech at the ENAI conference 2019 on the topic, and as COPE chair she has now published a discussion paper on the topic. There are a number of irritating points, as Elisabeth Bik points out in a Twitter thread, but on the whole this is a good paper to get this very important discussion going in the new year.

How can we tell whether or not a journal is legitimate or not? Legitimate in the sense that rigorous peer-review is not just stated, but actually done? We are in a current world situation in which certain groups attack science because it is informing us of uncomfortable truths. Predatory publishers offer a welcome point of attack, as the weaknesses of the "science" they publish are immediately assumed for all science. The "self-regulation" of science has been shown in recent years to not actually do the work it is supposed to do, despite the efforts of so many to point out issues that need attention.

Researchers need guidance about publication venues. Beall's list was taken down for legal reasons, but there is a web site that publishes an archived copy of the list that was taken on 15 January 2017. That was soon after the 2017 list was published.

There is a checklist available at thinkchecksubmit.org that is useful, but not a list of problematic publications, probably for legal reasons.

We can't keep putting out heads in the sand about the problems of academic misconduct. If we only look away, we let people get away with bad science, and that then reflects on us all.

## Friday, October 4, 2019

### ENAI2019 - Day 1

Very sorry about this - I took notes and then just never found the time to transcribe my notes.

ENAI 2019  Day 1 - ENAI 2019  Day 2 - ENAI 2019  Day 3

The European Network for Academic Integrity (ENAI) held its 2019 conference in Vilnius, Lithuania. I was presenting about the work of VroniPlag Wiki and the preliminary results of ENAI's test of software support tools for plagiarism detection. I've taken notes, so I will try and at least get summaries of what people are doing online.

Deborah C. Poff, the current chair of COPE, aretired professor for business ethics, and a former dean at the University of Northern British Columbia, Canada, opened the conference with a keynote on "The role of research integrity and publication ethics in university education for the 21st century." She started off speaking about the purpose of a university. It may once have been instilling in the students the values of truth, honesty, competency, hard-work, & study, but over they years this has changed. Many are focused on getting a good job, and they are paying very much money (in North America) for their education. She notes that as many as 50 % of students in the US say that they are disengaged from their studies. At universities administration has risen in importance as education and scholarship have declined. There is a discernable shift from a student-focused institution to a parents-as-consumers, global "excellence" mindset.

She noted that students don't understand the serious nature of plagiarism violations and told the story from her university of twins who plagiarized something from the Harvard Business Review. When confronted, they became angry and aggressive, and threatened with their father, a lawyer. He threatened to sue the faculty member reporting the plagiarism and the entire university.

She is currently putting together a book on Corporate Social Responsibility in the university, that is, helping universities to understand how to be responsible in the area of academic integrity.

In the session on "Addressing contract cheating (including legal practices)" I attended the talk by Wendy Sutherland-Smith & Kevin Dullaghan on "You don’t always get what you pay for: A user’s experiences of engaging with contract cheating websites."

They actually went out and purchased (!) ghostwriter work to evaluate the quality. They bought 54 assignments of between 825 and 2000 words from 18 sites that students use in various fields. They tried out both standard and premium quality work in order to look at cost vs. quality. The results were all over the map! The cost was between 50 and 300 Australian dollars, but 52 % of the work purchased failed to meet a passing grade in that subject. A ull 15 % of the papers were so unsatisfactory that they had to ask for them to be revised. But as of July 2018 they were still waiting for one company to respond. Premium was not much better than standard, and some failed to deliver on time.

They also looked at the privacy policies published on the web sites. Students should note that the companies have their identities and some threaten to publish the names of students who used the services if they stop payment.

The hardest part of the research was obtaining ethical board approval. Some felt that they were just supporting the industry, but they were able to convince the board that it is important to test something, not just guess how it is. They guarded the privacy of the students participating in the effort, if passport copies were needed or only credit cards accepted, they stopped the test. Only PayPal payments were accepted.

One interesting side effect that they discovered was that apparently ghostwriting companies are sending people to classes so that they are registered in the learning management systems and are able to send back to the company a list of emails of fellow students and a list of topics and dates of papers due. This permits the companies to send out targeted advertising to the students.

Thomas Lancaster followed Wendy with his talk on "Exploring low-cost contract cheating provision enabled through micro-outsourcing web sites" about trying to find out who exactly is providing the contract cheating services.

He noted that there is a demand for such work, a ready-made supply of labor, and that it is an established industry. There are even conferences being held for contract cheating writers. The salary for a full-time writer for a provider in Pakistan starts as $84 USD/month. The price for students connecting directly with writers, for example via sites such as Fiverr, is about$30 USD/1000 words.

Thomas did two studies, one in July 2016 and one in October 2018, contacting all of the writers he could find on Fiverr with the keywords "write essay". There were 93 providers in 2016, 197 in 2018. He noted that the advertised prices have gone down, although for $6/1000 words chances are slim that one will get a good assignment. This is why it is important to educate staff that contract cheating is NOT expensive, so it is important to develop assignments that cannot be turned around quickly. Anna Krajewska spoke on "Attitudes to eradicating contact cheating and collusion amongst Widening Participation students in the UK: reflections from Foundation Year students at Bloomsbury Institute." The "Foundation Year" is a bridging year before beginning university studies and is often taken by non-traditional and diverse students. That is, they are older, or may speak English as an additional language, or have young children, or are Black, Asian, or some other Minority Ethnic. They launched a campaign "Integrity matters!" and interviewed students on integrity. Most had a good understanding of cheating and collusion, but are often so afraid of writing in English that they resort to copying. When asked what would help, they responded that they would ilke additional English-language classes, clearer instructions, additional workshops, stricter penalties, more frequent but smaller assignments, more information, exams and presentations instead of essay and reports, posters & videos, and a whistleblowing policy. Penny Bealle, Prof. of Library Service, Suffolk County Community College, Riverhead, NY, offered "Need concise academic integrity lessons? Try these!" I was expecting either short e-learning lessons or 10-minute quick discussion topics, but this was more of multiple choice questions on a very basic level, and a video made in 2010. Turnitin, as one of the sponsors of the conference, got to nominate a keynote speaker. Erica Flinspach from the University of South Africa, spoke on "Encouraging originality & celebrating diversity on a mega scale: The UniSA story." Her university has 400.000 distance learning students, 7000 staff members and 30.000 tutors. They have used Turnitin since 2008. They apparently now use it as a "teaching tool", although I am afraid that that rather encourages "re-sentencing" (a new term I learned at the conference for rewriting a sentence). She notes that instructors must set an example when referencing in the study material they compile and respect the authorship of the student. The student's aim should not be to reduce the similarity index, but to give his or her own interpretation of the study/research done. Although certain percentages might be acceptable under certain conditions, blatant plagiarism is completely unacceptable regardless of similarity score. There is no acceptable similarity index, the evaluation is influenced by the purpose of the document, the expectation from the instructor, and the relevant subject field. I asked her if the instructors are aware that all software systems produce false negatives, she only answered that instructors are taught to be alert to signs that a text is plagiarized. Teddi Fishman chaired my session, I spoke about "Plagiarism in German doctoral dissertations – still a marginal issue 8 years after the Guttenberg case." I explained what VroniPlag Wiki does and that despite a bit of chatter on the part of the universites, not much has really changed. The last talk of the day was by Anthony E. Gortzis on "Pathos for ethics, leadership and the quest for a sustainable future." He noted that problems arise from a lack of Business Ethics in corporate routine operations and a loose or even non-existent external audits & controls from the state / the stock exchange / other international organizations. He presented a Responsible Management Model with dimensions: Moral Culture, Moral Conduct, Communication and Regulations. From this social responsibility and corporate governance can arise. He lists the four "Whos", I find these good questions to ask in cases of plagiarism in doctoral dissertations: • Who is Responsible? The person who was assigned to do the work. • Who is Accountable? The person who makes the final decision and has the ultimate ownership. • Who is Consulted? A person who must be consulted before a decision or action is taken. • Who is Informed? The person who must be informed that a decision or action has been taken. The day closed with the ENAI business meeting. ## Thursday, October 3, 2019 ### Plagiarism around the world I've just realized that I didn't get the promised ENAI posts done in June. I'll see if I can scratch something together. In the meantime, a few plagiarism links I've got saved in tabs: • Plagiarism in work of departing Dean Dymph van den Boom The University of Amsterdam reported in June 2019 that an interim dean's public address and parts of her thesis have been found to have been plagiarized. • Kenyatta University Revokes Lecturers PhD For Cheating A recent PhD grantee who was lecturing at Kenyatta University was found to have plagiarized the thesis of a Nigerian don. It appears that the don himself discovered the plagiarism. • The Neue Zürcher Zeitung reports (in German) that the Serbian Minister of Finance is charged with plagiarism in his dissertation granted by the University of Belgrade. The university was reluctant to deal with the situation, but the plagiarism is apparently so clear that students have been protesting, insisting that the university take up a proper investigation and publish the secret report. The university has reluctantly agreed to a November 4, 2019 date of publication. The minister himself, the NZZ wryly notes, doesn't seem to care. He participated in the Berlin Marathon last week, putting down his name as "Dr. Mali". • "Inspiration" or plagiarism? Journal du Geek reports (in French). Apparently, a French comedian is using copyright to take down video reports on what some say is plagiarism, but he insists is just inspiration or "the spirit of the times". I gave a talk at the Leibniz Institute's PhD Network Day in Potsdam last week and spoke with a great bunch of PhDs about power hierarchies and academic misconduct. Two students from the Research Center Borstel told me that the institution has really gotten proactive about good academic conduct after the scandals there (see 1 - 2 - 3). They have orientation for new PhDs on good academic conduct, and insist on half-yearly reviews. They have a published plan, but I can only find it in German, their web site doesn't properly redirect to the translated pages. Update: Just as I finished, another one dropped in by way of ENAI (European Network of Academic Integrity): Mr. Rinat Maratovich Iskakov has published a documentation that demonstrates that the dissertation of the Vice Minister of Education and Science of Kazakhstan is plagiarized The analysis is published on a Google Docs document. The first half of the document is the original and the second half is in English, translated by Ali Tahmazov. Apparently, the Polish plagiarism detection software StrikePlagiarism was used: Анализ проверки диссертационной работы Жакыповой Ф.Н. на соискание ученой степени доктора экономических наук проведено с помощью системы StrikePlagiarism компании Plagiat.pl ## Friday, June 7, 2019 ### WCRI 2019 - Day 3 Day 0 - Day 1a - Day 1b - Day 2 - Day 3 One's brain is already exploding, and there is one more day ahead. I decided to miss the first plenary about fostering research integrity in Malaysia, Korea and China. Session: Publishing 1 Ana Jeroncic, University of Split School of Medicine, Split "History of scientific publishing requirements: a systematic review and meta-analyses of studies analysing instructions to authors" It is interesting to see all of the things that can be investigated. This one was looking at Instructions to Authors (ItAs) that describe manuscript submission procedures and journal policies. In particular, they conducted a systematic review of papers about ItAs. They found 153, the number increasing as digital publishing takes over. The topics slide was only up for a few seconds, but ItAs address issues beyond manuscript formatting such as publication ethics, clinical trial registration, authorship, conflicts of interest.... I asked about plagiarism of ItAs, that is, non-affiliated journals just copying ItAs from other journals, but they didn't look at that. Michael Khor, Professor at Nanyang Technological University, Singapore, managed to fit something like 40 slides on "Global trends in research integrity and research ethics analysed through bibliometrics analysis of publications" into his allotted 10 minutes. It was quite entertaining, but one could barely take notes, as looking down momentarily meant that you missed a slide or two. It seems he looked at over 25 000 publications on research integrity and research ethics, using a graph representation tool to visualize relationships. He was showing topic maps, selecting by country to show how the topics are quite different from country to country and how the topics have changed over time. I would love to see this in print, as I need time to look over the graphs and take in what exactly has changed (and what disappears). It was noted in the discussion that Scottish authors self-identfy as Scottish and not as UK :) The talk I was waiting for was Harold "Skip" Garner, VCOM (Via College of Osteopathic Medicine), Blacksburg, speaking about "Identifying and quantifying the level of questionable abstract publications at scientific meetings." Skip is the driving force behind ETblast and Déjà vu, a technique that uncovered many duplicate publications and plagiarisms in biomedical publications. He currently runs HelioBLAST, a text similarity engine that finds text records in Medline/PubMed that are similar to the submitted query.You plugin up to 1000 words and look at what bubbles up. He collected conference abstracts found on the open web and has set up an Ethics DB that lets one browse through or do some text mining on the data. There are a lot of false positives such as people submitting five versions of their manuscript and the conference having all of them available web-facing. But there were questionable things tht turned up such as the same abstract at different conferences with different author orders. Interestingly, he was able to find some instances of salami slicing using this method. He then compared the abstracts of 2018 to Medline. Here he turned up things such as previously published material being submitted to a conference 2 years later. He has classified these as "old findings." It seems that since there is such a time lag between abstract submission and acceptance or rejection, people submit their work to multiple conferences. As a side-effect of his similarity investigations he can take the accepted papers for a conference and let the computer organize them into tracks of similar papers. Catriona Fennell, Elsevier, Amsterdam "Citation manipulation: endemic or exceptional?" Estimated prevalence of citation manipulation by reviewers based on the citation patterns of 69,000 reviewers She started off with a Dutch saying, "never let at good crisis got to waste". There was a scandal involving citation stacking in soil science that had affected Elsevier. They investigated the entire area of citation coercion through reviewers, citation pushing done by editors, and citation stacking done in journals. She noted what a journal can do to fight this: • Make it clear that citation coercion is unacceptable • Educate editors • Remove reviewer privileges • Inform institutes and funding bodies • Create editorial systems to detect self-citations in reviews or revision letters? • Retract citations? • Black-list worst offenders? • Share information with other journals? The last four are not really possible, in particular, citations cannot be retracted. There are COPE guidelines for reviewers, and Elsevier eithical guidelines. Also an article by Christopher Tancock about the practice: "The ugly side of peer rewiew". Elsevier looked through 54 000 reviews stored in their systems and identified 49 persons to look more closely at. In particular there was"Dr. X" with an h-index of 90 and 20 000 citations in Scopus. They contacted him/her, but they were entirely unrepentant, the institute was unresponsive, there was no funding body for the research, the person is active as an author even more so as reviewer. The person is now no longer a reviewer for Elsevier. She also spoke about generic reviews that are so unspecific, they fit every paper. She called them "horoscope reviews". They saw some reviewers apparently copy & pasting these reviews into their responses. The last speaker in the session (and rightly so the winner of one of the best speaker awards) was Alexander Panchin, Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow on "Concealed homeopathy: a natural test of peer-review quality". A Russian pharmaceutical inventor (and holder of a patent on a homeopathic "remedy") has "discovered" that is cures pretty much all ailments. Alexander had pictures of it being sold in stores in Russia and heavily advertised. It is made up of "diluted" antibodies, supposedly 1:10^16. There are variations that "combine" dilutions of 1:10^24 and 1:10^30. There is essentially nothing in the pills except sugar, which is why it is a tad off to take these pills to "cure" diabetes. In the patent application it is called a homeopathic drug, but it is now called "ultra-low dosage" or "release-active" drugs. Alexander tracked down many papers published by this gentleman, he was even an editor of a special edition published at SpringerLink that included 46 of his own papers! The papers do not disclose his conflict of interest, and often have very flawed study designs, showing peer-review not kicking in. Alexander wrote to the journals and has managed to get three retractions and two promises to retract, but the authors of a review article that include many references to this stuff refuse to issue a correction until ALL of the flawed papers are retracted.... Even though the Minister of Science has named this manufacturer as the most damaging pseudoscience project, scientists and newspapers that have reported on this have been sued, so I am keeping the name off the blog. After lunch we had the Plenary session on Predatory publishing and other challenges of new models to share knowledge I was really looking forward to this session and it didn't disappoint! Deborah C. Poff, the new COPE chair and a philosopher from Ottawa titled her talk "Complexities and approaches to predatory publishing" She spoke at lightning speed, getting faster as time began running out. It could have been at least a two-hour lecture, so jam-packed it was with really good stuff. I could barely keep up, so I hope I get the highlights right. A definition for predatory publishing is problematic, as there is much overlap with legitimate but new or smallish publishers. She looked at necessary and sufficient conditions for a definition, but found that while deceit is necessary, sufficient conditions are vexing to try and capture. PP cheat and deceive some authors charging publishing related fees without providing services; PP deceive academics into serving on editorial boards; PP appoint editorial board members without knowledge; no peer review; refuse to retract or withdraw problematic papers; etc. The list goes on: Misleading reporting, language issues, lack of ethical oversight, lack of declarations of conflicts of interest, lack of corrections or retractions, lack of qualified EiC (if any), made-up rejection rates, false impact factors, false claims of being indexed in legitimate indexes, falsely claiming membership in publication ethics organization including forgery and falsifying logos of such organization. COPE apparently had to fight a forged COPE logo. What should we call them, anyway? Arguments against the term "predatory": It is not descriptive or instructive, so some suggest using fake, rouge, questionable, parasitic, deceptive, etc.; predatory suggests victims, powerless people who are acted upon without their full knowledge, while a number of studies have shown that some scholars knowingly publish in such journals; Calling the issue "predatory" obviates or mitigates the personal responsibility for choosing where to publish. The best argument for using the term: Since Jeffrey Beall coined the term, why not use it? COPE is undecided on what name is best. I particularly liked Deborah's stakeholder analysis of who or what is harmed by these publishers: • The innocent author who is duped into paying for services without receiving them. They may lose status when peers discover that they have published in such a journal, and it can even lead to investigations. Since many such publishers refuse to retract, the damage done may be long-term. • Legitimate Open Access Journals are easily confused with predatory Open Access Journals • Legitimate journals which are not top ranked or may not follow best practice are also easily confused with them. • Research and funding sources: This depends on whether the research published is legitimate or not. If the research is shoddy and gets published by a PP journal, it may be cited and thus pollutes the scholarly record. If a scandal arises, the scandal may tarnish publicly funded research. • Universities and their role in knowledge creation. • Citizens who pay taxes. She pointed out that predatory publishers make a great business ethics case. In closing, she sees only two things that can be done: 1. Caveat Emptor (let the buyer beware) - use Think / Check / Submit: do you read the journal yourself? Do you cite research published there? Do your colleagues? Who is the editor-in-chief? 2. Addressing and pursuing predatory publishers as businesses committing criminal acts. The USA Federal Trade Commission won a court case agains the owner of OMICS and the company itself. The courts fined OMICS$50.1 million.

Bhushan Patwardhan, Professor of medicine, Vice chairman, University Grants Commission, New Delhi, spoke on "Research integrity and publication ethics: Indian scenario". Bhushan first spoke about the University Grants Commission and gave an overview of the India Higher Education sector.

There are more than 900 universities and more than 10.000 other institutions with 1.2 million teachers somehow coping with 36.6 million students. There are just shy of 150 000 publications produced in India per year, and unfortunately, many of these appear in problematic journals.

There is a paper about the situation in India, they selected 2000 Indian authors for papers in journals on Beall's list and sent them a survey. 480 responded, almost 60 % were unaware that they were publishing in a predatory journal:
G. S. Seethapahy,  J. U. Santhosh Kumar & A. S. Hareesha. (2016 December 10). India's scientific publication in predatory journals, need for regulating quality of Indian science and education. Curr Sci, 111(11), pp. 1759-64

Bhushan was shocked to find out just how many Indian publications were in predatory journals. India has just set up the Consortium for Academic and Research Ethics (CARE) in 2019. The goals of the CARE project are to
• create and maintain a CARE list of reputable journals
• promote research publications in reputable journals
• develop an approach and a methodology for identification of quality journals
• discourage publications in dubious journals
• avoid long-term damage due to academic misconduct
• promote academic and research integrity and publication ethics
He put up the URL of the site for CARE: http://ugccare.unipune.ac.in/index.html, but the site was down for "maintainence," as it had not even been up for a day before the site was cloned and published on a similar URL by unknown persons.

Then Matt Hodgkinson, Head of Research Integrity @ Hindawi Ltd., London, took the stage to give "A view of predatory publishing from an open access publisher". He first gave a bit of a historical overview and told us a bit about Hindawi. It was founded in Cairo in 1997, publishing the first subscription journals in 1999. In 2007 all journals were flipped to Open Access. In 2016 they created their Research Integrity team that handles all issues that arise at their journals. The headquarters of Hindawi moved to London in 2017.

He spoke of the impact that predatory journals have on legitimate, Open Access journals: they are tarred with the same brush. They also create false impressions for authors, who now expect undue speed in legitimate publishers, and out of impatience (Matt called it "gazumping") dual submissions to see which journal publishes first. They have had so many instances of this, Matt told me over coffee, that they check for text similarity online twice: once at submission, and once more just before publication. Many times they have caught double dippers this way.

He expanded the concept of predatory publishers to what he called the "Cargo cult" publishers (ones who publish unedited theses or the Wikipedia as "books"), paper mills, the selling of authorship and faked peer-review. He also noted that the subscription model is not immune to fakery - there are subscription journals that closely mirror the titles of legitimate publishers, something called hijacking.

He closed with some scandals (publications about elephant autism or space octopi) and then listed some of the newest ideas, the various pre-print server. The question arises, however, how sustainable such initiatives are.

Although I was planning on visiting another session, Jenny Byrne insisted that the session on checking data and images would be very interesting, and she was right. I had thought that Elisabeth Bik was the only person around perusing doctored images, but it turns out there are quite a number of initiatives.

First up was Jana Christopher from FEBS Press, Heidelberg, speaking about "Image Integrity in Scientific Publications."

She observed that the prevalence of image aberrations in publications is generally underestimated. Although there are ways to catch simple-minded manipulators, much like with plagiarism, people are getting more sophisticated in hiding their tracks. Her focus is on Western blots, micrographs, or photos, anything that can be overlayed in Photoshop. If they match identically, there's a problem. She showed in a quick demo how she loads suspected duplicates into different color channels and overlays them. The result is black for identical portions of the image.

She differentiated between manipulated images and wrong images being used to illustrate a finding. Why do people do this? Some apparently want a cleaner, more striking image. Others want to show a particular feature more clearly. Then there are those who wish to show a result that was not actually produced.

She showed some more examples of pictures that have crossed her desk, cut-outs clearly shown as transparent background, the clone tool being used to overwrite undesirable portions of an image, or images that are supposed to show different plants but because of the pattern of the soil are clearly the same plant.

Rennee Hoch, the Senior Manager and Team Manager of the Publication Ethics Team at PLOS One, San Francisco, sang the same song, second verse with her talk on the "Impact of data availability on resolution of post-publication image concern cases."

She noted that image concerns make up 39 % of the concerns raised in her department, but 75 % of the retractions. She took 100 cases of post-publication image cases from 2017-2019 and had a statistical look at them. The numbers flew by so fast, I was unable to keep up. 94 of the cases were with image duplication, the other 6 manipulation or fabrication. All fabrications have been retracted, for manipulations or duplications about half have an Expression of Concern or a Correction.

Their big issue is that when a concern is raised, they request the original data, and none is forthcoming. The excuses are similar: can't find the files, hard-disk crash, person left the lab. Concerns are coming in up to 5 years after publication, but some countries only have a three-year retention policy. So that is clearly not sufficient. At times they wonder if the data ever existed at all, although there is a lot of honest error or poor practice.

What can a journal do? They can require submission of the raw image data, and have the peer-review done with the raw image data, as well as publishing that as supplementary material. This permits better assessment and the journal can make sure that the images are archived properly.

In the discussion it turned out that many journals, upon requesting original data, get sent PowerPoint slides with screenshot images - completely useless for the task.

Daniel Acuna, a computer scientist from Syracuse University in New York State, USA, provides tools to Research Integrity Officers (RIOs) to help investigate cases. His talk on "Helping research misconduct investigations: methods for statistical certainty reporting of inappropriate figure reuse"was about a statistical tool that helps evaluate if the excuse of a scientist ("it just happened by chance") really makes sense.

Similar instruments might indeed generate similar artefacts, image processing software might generate similar noise, software reproducability might generate similar results, and there are some reuse of images that is legitimate, for example, generic brains used as underlays for captions.

They scraped about a million images they could find on PubMed Central, and had to scrape them from PDF which does not actually make things better. They calculated a similarity index, setting a high likelyhood threshold and then looking at the results. They managed a 63 % area under the ROC curve, which is not brilliant, but marginally better than flipping a coin (50 %). They need more images in order to refine their algorithm.

Thorsten Beck from the HEADT center (funded by Elsevier) at the Humboldt University, Berlin, spoke about the image integrity database that they are putting together. Bik, Fang & Casadevall have shown in their 2016 and 2018 papers that about 4 % of all published images have issues, a good 35 000 papers are in want of retracting for this reason.

They want to build a structured database with images from retracted images, recording as much information as they can about the authors of the publications, their institutions, the reason for the retraction, etc. However, retraction notices are famous for being vague, on account of authors suing journals. They want to keep track of who manipulated the image and who detected it, but seeing as how institutions are highly reluctant to disclose the results of an investigation, good luck in trying to obtain that data. [Although Nature has a  WorldView column this week by C. K. Gunsalus calling for institutions to be more transparent about their decisions]. And then there are copyright issues, so there are many challenges.

Jennifer Bryne, an oncologist from the Children's Hospital at Westmead, Australia, presented her work together with Cyril Labbé (University of Grenoble, France) on the Seek&Blast tool.

She first gave us a two minute introduction into genetics, noting that the nucleotide sequences for certain genes are such long strings of letters that no human being can easily remember them. She does, however, remember the name given to some cell line, TPD52L2, that she had worked with ages ago. There had been a dozen and a half papers about this many years ago, and suddenly it was popping up all over the place in papers by various Chinese authors for a wide variety of cancers, which is impossible. The cells come from only one organ.

[Matt Hodgkinson has sent in a correction: "Small correction - TPD52L2 is a gene Jennifer cloned in 96. The authors of suspect papers often reported studying it in cell lines known by https://t.co/1Lci9g8bfb to be really HeLa & they often got primer sequences for detecting & knocking down the genes wrong." I can't pretend to understand that, but I'm thankful for the correction!]

As she began reading the papers, she realised that they didn't make sense at all, something about the targeting sequence being off. In speaking with Cyril about this issue, he immediately saw that the nucleotide sequence is just one big word, so it is simple to parse them out of papers. He went and did so, and was even able to identify the context in which these nucleotide sequences were used, so that impossible uses of them could also be identified.

The system, as many software systems in this area are, has a large false positive and false negative rate. The positives must thus be manually examined before flagging a paper. They published a paper about it in Scientometrics, "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," identifying the flagged papers. We had a look at the papers they identified with nucleotide sequence overlap and the ones I was reporting on with text overlap, and found that the same journal was publishing these papers. They are having very similar problems as I am in getting the offending papers retracted.

The service is available online at http://scigendetection.imag.fr/TPD52/ for looking to see if there are any publications with a particular sequence. They caution to manually verify a paper before taking any action such as commenting or contacting someone. This is not an automatic detector! Cyril will continue refining the algorithm used, he said after the presentation.

We were now down to the final session.

Maura Hiney and Daniel Barr reported on their results from the focus track on ensuring integrity in innovation and impact, and Klaas Sijtsma reported on the progress being made with the Registry for Research on the Responsible Conduct of Research. He now revealed what some of seemingly odd data was that was being collected at submission time: They wanted to see how many of the accepted papers had been pre-registered. It wasn't many. I think that pre-registration is fine for clinical trials, but there are many other methods of doing research that do not fit in the pre-registration mindset. In particular, when you observe something odd and end up chasing down a crooked alley and suddenly having a great big new field show up, you will hardly have pre-registered what you are writing up for other scientists.

David Moher reported on the Hong Kong Manifesto for the Assessing Researchers, discovering that the principles need a good bit of re-drafting.

The best speaker awards for young researchers were announced, and then Lex Bouter extended an invitation to attend the 7th World Conference for Research Integrity, to be held in 2021 at the University of Cape Town, South Africa.

That's it for blogging, I'm writing this on the plane with a screaming baby in my aisle, I will now put some music on the noise-cancelling headphones, in the hopes of drowning out the piercing screams. We've still got 4 hours to go....

Update 8 June 2019:  Vasya Vlassov had a friend film his talk about Dissernet.