Copy, Shake, and Paste: WCRI 2019

One's brain is already exploding, and there is one more day ahead. I decided to miss the first plenary about fostering research integrity in Malaysia, Korea and China.

Session: Publishing 1

Ana Jeroncic, University of Split School of Medicine, Split
"History of scientific publishing requirements: a systematic review and meta-analyses of studies analysing instructions to authors"

It is interesting to see all of the things that can be investigated. This one was looking at Instructions to Authors (ItAs) that describe manuscript submission procedures and journal policies. In particular, they conducted a systematic review of papers about ItAs. They found 153, the number increasing as digital publishing takes over. The topics slide was only up for a few seconds, but ItAs address issues beyond manuscript formatting such as publication ethics, clinical trial registration, authorship, conflicts of interest....
I asked about plagiarism of ItAs, that is, non-affiliated journals just copying ItAs from other journals, but they didn't look at that.

Michael Khor, Professor at Nanyang Technological University, Singapore, managed to fit something like 40 slides on "Global trends in research integrity and research ethics analysed through bibliometrics analysis of publications" into his allotted 10 minutes. It was quite entertaining, but one could barely take notes, as looking down momentarily meant that you missed a slide or two. It seems he looked at over 25 000 publications on research integrity and research ethics, using a graph representation tool to visualize relationships. He was showing topic maps, selecting by country to show how the topics are quite different from country to country and how the topics have changed over time. I would love to see this in print, as I need time to look over the graphs and take in what exactly has changed (and what disappears).

It was noted in the discussion that Scottish authors self-identfy as Scottish and not as UK :)

The talk I was waiting for was Harold "Skip" Garner, VCOM (Via College of Osteopathic Medicine), Blacksburg, speaking about "Identifying and quantifying the level of questionable abstract publications at scientific meetings." Skip is the driving force behind ETblast and Déjà vu, a technique that uncovered many duplicate publications and plagiarisms in biomedical publications. He currently runs HelioBLAST, a text similarity engine that finds text records in Medline/PubMed that are similar to the submitted query.You plugin up to 1000 words and look at what bubbles up.

He collected conference abstracts found on the open web and has set up an Ethics DB that lets one browse through or do some text mining on the data. There are a lot of false positives such as people submitting five versions of their manuscript and the conference having all of them available web-facing. But there were questionable things tht turned up such as the same abstract at different conferences with different author orders. Interestingly, he was able to find some instances of salami slicing using this method. He then compared the abstracts of 2018 to Medline. Here he turned up things such as previously published material being submitted to a conference 2 years later. He has classified these as "old findings." It seems that since there is such a time lag between abstract submission and acceptance or rejection, people submit their work to multiple conferences.

As a side-effect of his similarity investigations he can take the accepted papers for a conference and let the computer organize them into tracks of similar papers.

Catriona Fennell, Elsevier, Amsterdam

"Citation manipulation: endemic or exceptional?"

Estimated prevalence of citation manipulation by reviewers based on the citation patterns of 69,000 reviewers

She started off with a Dutch saying, "never let at good crisis got to waste". There was a scandal involving citation stacking in soil science that had affected Elsevier. They investigated the entire area of citation coercion through reviewers, citation pushing done by editors, and citation stacking done in journals.

She noted what a journal can do to fight this:

Make it clear that citation coercion is unacceptable
Educate editors
Remove reviewer privileges
Inform institutes and funding bodies
Create editorial systems to detect self-citations in reviews or revision letters?
Retract citations?
Black-list worst offenders?
Share information with other journals?

The last four are not really possible, in particular, citations cannot be retracted. There are COPE guidelines for reviewers, and Elsevier eithical guidelines. Also an article by Christopher Tancock about the practice: "The ugly side of peer rewiew".
Elsevier looked through 54 000 reviews stored in their systems and identified 49 persons to look more closely at.

In particular there was"Dr. X" with an h-index of 90 and 20 000 citations in Scopus. They contacted him/her, but they were entirely unrepentant, the institute was unresponsive, there was no funding body for the research, the person is active as an author even more so as reviewer. The person is now no longer a reviewer for Elsevier.
She also spoke about generic reviews that are so unspecific, they fit every paper. She called them "horoscope reviews". They saw some reviewers apparently copy & pasting these reviews into their responses.

The last speaker in the session (and rightly so the winner of one of the best speaker awards) was Alexander Panchin, Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow on "Concealed homeopathy: a natural test of peer-review quality".

A Russian pharmaceutical inventor (and holder of a patent on a homeopathic "remedy") has "discovered" that is cures pretty much all ailments. Alexander had pictures of it being sold in stores in Russia and heavily advertised. It is made up of "diluted" antibodies, supposedly 1:10^16. There are variations that "combine" dilutions of 1:10^24 and 1:10^30. There is essentially nothing in the pills except sugar, which is why it is a tad off to take these pills to "cure" diabetes.

In the patent application it is called a homeopathic drug, but it is now called "ultra-low dosage" or "release-active" drugs.

Alexander tracked down many papers published by this gentleman, he was even an editor of a special edition published at SpringerLink that included 46 of his own papers! The papers do not disclose his conflict of interest, and often have very flawed study designs, showing peer-review not kicking in.

Alexander wrote to the journals and has managed to get three retractions and two promises to retract, but the authors of a review article that include many references to this stuff refuse to issue a correction until ALL of the flawed papers are retracted....

Even though the Minister of Science has named this manufacturer as the most damaging pseudoscience project, scientists and newspapers that have reported on this have been sued, so I am keeping the name off the blog.

After lunch we had the Plenary session on
Predatory publishing and other challenges of new models to share knowledge
I was really looking forward to this session and it didn't disappoint!

Deborah C. Poff, the new COPE chair and a philosopher from Ottawa titled her talk "Complexities and approaches to predatory publishing"

She spoke at lightning speed, getting faster as time began running out. It could have been at least a two-hour lecture, so jam-packed it was with really good stuff. I could barely keep up, so I hope I get the highlights right.

A definition for predatory publishing is problematic, as there is much overlap with legitimate but new or smallish publishers. She looked at necessary and sufficient conditions for a definition, but found that while deceit is necessary, sufficient conditions are vexing to try and capture.

PP cheat and deceive some authors charging publishing related fees without providing services; PP deceive academics into serving on editorial boards; PP appoint editorial board members without knowledge; no peer review; refuse to retract or withdraw problematic papers; etc.

The list goes on: Misleading reporting, language issues, lack of ethical oversight, lack of declarations of conflicts of interest, lack of corrections or retractions, lack of qualified EiC (if any), made-up rejection rates, false impact factors, false claims of being indexed in legitimate indexes, falsely claiming membership in publication ethics organization including forgery and falsifying logos of such organization. COPE apparently had to fight a forged COPE logo.

What should we call them, anyway? Arguments against the term "predatory": It is not descriptive or instructive, so some suggest using fake, rouge, questionable, parasitic, deceptive, etc.; predatory suggests victims, powerless people who are acted upon without their full knowledge, while a number of studies have shown that some scholars knowingly publish in such journals; Calling the issue "predatory" obviates or mitigates the personal responsibility for choosing where to publish.

The best argument for using the term: Since Jeffrey Beall coined the term, why not use it?
COPE is undecided on what name is best.

I particularly liked Deborah's stakeholder analysis of who or what is harmed by these publishers:

The innocent author who is duped into paying for services without receiving them. They may lose status when peers discover that they have published in such a journal, and it can even lead to investigations. Since many such publishers refuse to retract, the damage done may be long-term.
Legitimate Open Access Journals are easily confused with predatory Open Access Journals
Legitimate journals which are not top ranked or may not follow best practice are also easily confused with them.
Research and funding sources: This depends on whether the research published is legitimate or not. If the research is shoddy and gets published by a PP journal, it may be cited and thus pollutes the scholarly record. If a scandal arises, the scandal may tarnish publicly funded research.
Universities and their role in knowledge creation.
Citizens who pay taxes.

She pointed out that predatory publishers make a great business ethics case. In closing, she sees only two things that can be done:

Caveat Emptor (let the buyer beware) - use Think / Check / Submit: do you read the journal yourself? Do you cite research published there? Do your colleagues? Who is the editor-in-chief?
Addressing and pursuing predatory publishers as businesses committing criminal acts. The USA Federal Trade Commission won a court case agains the owner of OMICS and the company itself. The courts fined OMICS $50.1 million.

Bhushan Patwardhan, Professor of medicine, Vice chairman, University Grants Commission, New Delhi, spoke on "Research integrity and publication ethics: Indian scenario". Bhushan first spoke about the University Grants Commission and gave an overview of the India Higher Education sector.

There are more than 900 universities and more than 10.000 other institutions with 1.2 million teachers somehow coping with 36.6 million students. There are just shy of 150 000 publications produced in India per year, and unfortunately, many of these appear in problematic journals.

There is a paper about the situation in India, they selected 2000 Indian authors for papers in journals on Beall's list and sent them a survey. 480 responded, almost 60 % were unaware that they were publishing in a predatory journal:

G. S. Seethapahy, J. U. Santhosh Kumar & A. S. Hareesha. (2016 December 10). India's scientific publication in predatory journals, need for regulating quality of Indian science and education. Curr Sci, 111(11), pp. 1759-64

Bhushan was shocked to find out just how many Indian publications were in predatory journals. India has just set up the Consortium for Academic and Research Ethics (CARE) in 2019. The goals of the CARE project are to

create and maintain a CARE list of reputable journals
promote research publications in reputable journals
develop an approach and a methodology for identification of quality journals
discourage publications in dubious journals
avoid long-term damage due to academic misconduct
promote academic and research integrity and publication ethics

He put up the URL of the site for CARE: http://ugccare.unipune.ac.in/index.html, but the site was down for "maintainence," as it had not even been up for a day before the site was cloned and published on a similar URL by unknown persons.

Then Matt Hodgkinson, Head of Research Integrity @ Hindawi Ltd., London, took the stage to give "A view of predatory publishing from an open access publisher". He first gave a bit of a historical overview and told us a bit about Hindawi. It was founded in Cairo in 1997, publishing the first subscription journals in 1999. In 2007 all journals were flipped to Open Access. In 2016 they created their Research Integrity team that handles all issues that arise at their journals. The headquarters of Hindawi moved to London in 2017.

He spoke of the impact that predatory journals have on legitimate, Open Access journals: they are tarred with the same brush. They also create false impressions for authors, who now expect undue speed in legitimate publishers, and out of impatience (Matt called it "gazumping") dual submissions to see which journal publishes first. They have had so many instances of this, Matt told me over coffee, that they check for text similarity online twice: once at submission, and once more just before publication. Many times they have caught double dippers this way.

He expanded the concept of predatory publishers to what he called the "Cargo cult" publishers (ones who publish unedited theses or the Wikipedia as "books"), paper mills, the selling of authorship and faked peer-review. He also noted that the subscription model is not immune to fakery - there are subscription journals that closely mirror the titles of legitimate publishers, something called hijacking.

He closed with some scandals (publications about elephant autism or space octopi) and then listed some of the newest ideas, the various pre-print server. The question arises, however, how sustainable such initiatives are.

Although I was planning on visiting another session, Jenny Byrne insisted that the session on checking data and images would be very interesting, and she was right. I had thought that Elisabeth Bik was the only person around perusing doctored images, but it turns out there are quite a number of initiatives.

First up was Jana Christopher from FEBS Press, Heidelberg, speaking about "Image Integrity in Scientific Publications."

She observed that the prevalence of image aberrations in publications is generally underestimated. Although there are ways to catch simple-minded manipulators, much like with plagiarism, people are getting more sophisticated in hiding their tracks. Her focus is on Western blots, micrographs, or photos, anything that can be overlayed in Photoshop. If they match identically, there's a problem. She showed in a quick demo how she loads suspected duplicates into different color channels and overlays them. The result is black for identical portions of the image.

She differentiated between manipulated images and wrong images being used to illustrate a finding. Why do people do this? Some apparently want a cleaner, more striking image. Others want to show a particular feature more clearly. Then there are those who wish to show a result that was not actually produced.

She showed some more examples of pictures that have crossed her desk, cut-outs clearly shown as transparent background, the clone tool being used to overwrite undesirable portions of an image, or images that are supposed to show different plants but because of the pattern of the soil are clearly the same plant.

Rennee Hoch, the Senior Manager and Team Manager of the Publication Ethics Team at PLOS One, San Francisco, sang the same song, second verse with her talk on the "Impact of data availability on resolution of post-publication image concern cases."

She noted that image concerns make up 39 % of the concerns raised in her department, but 75 % of the retractions. She took 100 cases of post-publication image cases from 2017-2019 and had a statistical look at them. The numbers flew by so fast, I was unable to keep up. 94 of the cases were with image duplication, the other 6 manipulation or fabrication. All fabrications have been retracted, for manipulations or duplications about half have an Expression of Concern or a Correction.

Their big issue is that when a concern is raised, they request the original data, and none is forthcoming. The excuses are similar: can't find the files, hard-disk crash, person left the lab. Concerns are coming in up to 5 years after publication, but some countries only have a three-year retention policy. So that is clearly not sufficient. At times they wonder if the data ever existed at all, although there is a lot of honest error or poor practice.

What can a journal do? They can require submission of the raw image data, and have the peer-review done with the raw image data, as well as publishing that as supplementary material. This permits better assessment and the journal can make sure that the images are archived properly.

In the discussion it turned out that many journals, upon requesting original data, get sent PowerPoint slides with screenshot images - completely useless for the task.

Daniel Acuna, a computer scientist from Syracuse University in New York State, USA, provides tools to Research Integrity Officers (RIOs) to help investigate cases. His talk on "Helping research misconduct investigations: methods for statistical certainty reporting of inappropriate figure reuse"was about a statistical tool that helps evaluate if the excuse of a scientist ("it just happened by chance") really makes sense.

Similar instruments might indeed generate similar artefacts, image processing software might generate similar noise, software reproducability might generate similar results, and there are some reuse of images that is legitimate, for example, generic brains used as underlays for captions.

They scraped about a million images they could find on PubMed Central, and had to scrape them from PDF which does not actually make things better. They calculated a similarity index, setting a high likelyhood threshold and then looking at the results. They managed a 63 % area under the ROC curve, which is not brilliant, but marginally better than flipping a coin (50 %). They need more images in order to refine their algorithm.

Thorsten Beck from the HEADT center (funded by Elsevier) at the Humboldt University, Berlin, spoke about the image integrity database that they are putting together. Bik, Fang & Casadevall have shown in their 2016 and 2018 papers that about 4 % of all published images have issues, a good 35 000 papers are in want of retracting for this reason.

They want to build a structured database with images from retracted images, recording as much information as they can about the authors of the publications, their institutions, the reason for the retraction, etc. However, retraction notices are famous for being vague, on account of authors suing journals. They want to keep track of who manipulated the image and who detected it, but seeing as how institutions are highly reluctant to disclose the results of an investigation, good luck in trying to obtain that data. [Although Nature has a WorldView column this week by C. K. Gunsalus calling for institutions to be more transparent about their decisions]. And then there are copyright issues, so there are many challenges.

Jennifer Bryne, an oncologist from the Children's Hospital at Westmead, Australia, presented her work together with Cyril Labbé (University of Grenoble, France) on the Seek&Blast tool.

She first gave us a two minute introduction into genetics, noting that the nucleotide sequences for certain genes are such long strings of letters that no human being can easily remember them. She does, however, remember the name given to some cell line, TPD52L2, that she had worked with ages ago. There had been a dozen and a half papers about this many years ago, and suddenly it was popping up all over the place in papers by various Chinese authors for a wide variety of cancers, which is impossible. The cells come from only one organ.

[Matt Hodgkinson has sent in a correction: "Small correction - TPD52L2 is a gene Jennifer cloned in 96. The authors of suspect papers often reported studying it in cell lines known by https://t.co/1Lci9g8bfb to be really HeLa & they often got primer sequences for detecting & knocking down the genes wrong." I can't pretend to understand that, but I'm thankful for the correction!]

As she began reading the papers, she realised that they didn't make sense at all, something about the targeting sequence being off. In speaking with Cyril about this issue, he immediately saw that the nucleotide sequence is just one big word, so it is simple to parse them out of papers. He went and did so, and was even able to identify the context in which these nucleotide sequences were used, so that impossible uses of them could also be identified.

The system, as many software systems in this area are, has a large false positive and false negative rate. The positives must thus be manually examined before flagging a paper. They published a paper about it in Scientometrics, "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," identifying the flagged papers. We had a look at the papers they identified with nucleotide sequence overlap and the ones I was reporting on with text overlap, and found that the same journal was publishing these papers. They are having very similar problems as I am in getting the offending papers retracted.

The service is available online at http://scigendetection.imag.fr/TPD52/ for looking to see if there are any publications with a particular sequence. They caution to manually verify a paper before taking any action such as commenting or contacting someone. This is not an automatic detector! Cyril will continue refining the algorithm used, he said after the presentation.

We were now down to the final session.

Maura Hiney and Daniel Barr reported on their results from the focus track on ensuring integrity in innovation and impact, and Klaas Sijtsma reported on the progress being made with the Registry for Research on the Responsible Conduct of Research. He now revealed what some of seemingly odd data was that was being collected at submission time: They wanted to see how many of the accepted papers had been pre-registered. It wasn't many. I think that pre-registration is fine for clinical trials, but there are many other methods of doing research that do not fit in the pre-registration mindset. In particular, when you observe something odd and end up chasing down a crooked alley and suddenly having a great big new field show up, you will hardly have pre-registered what you are writing up for other scientists.

David Moher reported on the Hong Kong Manifesto for the Assessing Researchers, discovering that the principles need a good bit of re-drafting.

The best speaker awards for young researchers were announced, and then Lex Bouter extended an invitation to attend the 7th World Conference for Research Integrity, to be held in 2021 at the University of Cape Town, South Africa.

That's it for blogging, I'm writing this on the plane with a screaming baby in my aisle, I will now put some music on the noise-cancelling headphones, in the hopes of drowning out the piercing screams. We've still got 4 hours to go....

Update 8 June 2019: Vasya Vlassov had a friend film his talk about Dissernet.

Copy, Shake, and Paste

Friday, June 7, 2019

WCRI 2019 - Day 3

No comments:

Post a Comment

Search This Blog