Copy, Shake, and Paste: Computational Research Integrity

All right! I slept in this morning to try and have my body be in New York time and not Berlin time. Looking forward to the talks today, I will be on second after Elisabeth Bik. I changed my slides about 17 times yesterday to adapt to the discussions, it's about time I give the talk.

Day 1 - Day 2, 24 March 2021 - Day 3

Elisabeth Bik, the human image duplication spotter, gave us some great stories: How she got started on this (a plagiarism of her own work), what tools she uses, what tools she wishes she had, and even gave us some images to try and spot ourselves. On her Twitter feed (@MicrobiomDigest) she runs an #imageforensics contest. I'm ususally too slow to respond to them. What really puzzles me is: Why are people messing with the images? Why not do the experiments for real? Or if you must fake, use a different picture? We just need to let her get her hands on Ed Delp's tool! That would bring her superpowers up to warp speed!
I was up next with "Responsible Use of Support Tools for Plagiarism Detection", Elizabeth did a great tweet thread on the talk, thanks! I referred to Miguel Roig's work on self-plagiarism in response to a discussion yesterday. Here's our paper on the test of support tools for plagiarism detection and our web-page with all the gory details. And of course, the similarity-texter, a tool for comparing two texts. Sofia Kalaidopoulou implemented it as her bachelor's thesis. It is free, works in your browser, and nicely colors same text so the differences jump out and hit you in the eye.
Michael Lauer from the National Institute of Health then spoke about "Roles and Responsibilities for Promoting Research Integrity." He fired off a firework of misconduct cases that had to do with things like exfiltrating knowledge and research to China or misusing NIH funds with which I couldn't keep up. Some of the schemes were really brazen! A few that I got noted: The Darsee Affair in the 1980s (Article in the New England Journal of Medicine) - an internal peer-review tampering case - Duke University affair around Anil Potti - Chinese Researcher Sentenced for Making False Statements to Federal Agents. Espionage seems to be a really big problem!
Matt Turek Information Innovation Office (I2O), Program Manager at DARPA, spoke on "Challenges and Approaches to Media Integrity." He calmly and matter-of-factly presented some absolutely TERRIFYING, bleeding-edge research on image generation. We had seen some things Ed Delp spoke about yesterday. But things like a Deepfakes video of Richard Nixon appearing to read a speech that was written in case the moon shot (the Apollo 11 mission, I watched this in black and white on my grandmother's TV) ended in tragedy makes me despair that we will ever manage to deal with fake news. Nixon's lips move to the text he is reading, it is almost impossible to tell that this is a fake - except that I know that I saw a different ending in my youth. Matt ended with the possibility of "Identity Attacks as a Service", that is, ransomware that threatens to publish real-looking videos of someone unless they pay up. I'm glad his time was up, afraid that he would have more deeply unsettling things to show. Much as I personally do not agree with a lot that the military is wasting money on, this seems to be a good investment.
Zubair Afzal spoke on "Improving reproducibility by automating key resource tables", I have no idea what key resource tables are, but it seemed to be useful to biomedical researchers.
Colby Vorland, with "Semi-automated Screening for Improbable Randomization in PDFs", attempted to see if data makes sense by looking at the distribution of p values, which should be random. (Note from Elisabeth Bik: See e.g. Carlisle's work on p values in 5,000 RCTs). He has to go to enormous trouble to scrape table data out of PDFs. I suggest using Abbyy FineReader, which does a good job of OCRing tables. Why, oh why do PDFs not have semantic markup?
Panel 3: Funders
Benyamin Margolis (ORI), Wenda Bauchspies (NSF), Michael Lauer (NIH), and Matt Turek (DARPA) discussed various aspects of the funding of research integrity research. All sorts of topics were addressed with the links flying in the chat as usual:
Report Fraud, Waste, Abuse, or Whistleblower Reprisal to the NSF OIG - A link to help PIs prepare to teach or learn more about RCR. - NIH Policy for Data Management and Sharing - Deep Nostalgia - The Heilmeier Catechism - Find US government funding - Build and Broaden for encouraging diversity, equity and inclusion - DORA. The tabs I still have open probably came from this session, they are in the bullet list below.
Daniel Acuna and Benyamin Margolis introduced a competition: Artificial Intelligence for Computational Research Integrity. ORI is offering a grant (ORIIR200062: Large-scale High-Quality Labeled Datasets and Competitions to Advance Artificial Intelligence for Computational Research Integrity) for running the competition.
Panel 4: Tool Developers
Daniel Acuna (Syracuse University), Jennifer Byrne (University of Sydney), James Heathers (Cipher Skin), and Amit K. Roy-Chowdhury (UC Riverside) were discussing.
Jennifer and Cyril Labbé have published their protocol for using Seek & Blastn at protocols.io. And they have a paper on biomedical journal responses that closely mirrors my own experiences.
James talked about his four projects GRIM (Preprint), SPRITE (Preprint), DEBIT, and RIVETS. His statistical work should scare the daylights out of data fabricators. As he points out: by the time they falsify their data to fit the statistical models, they might as well have done the experiments.
Amit spoke a bit more in depth about the work Ghazal presented yesterday and the challenges involved in developing an image analysis tool.
Daniel talked about Dr. Figures (Preprint)
Someone (I didn't catch who, James?) said "Death to PDF!" Indeed, or rather, it needs to be easily parseable so that we can easily mine metadata, get the text and images separated, etc. Cyril posted a link to a good PDF extractor in the chat, I shall look into this very soon.

Links to things in tabs I still have open that someone put in the chat at some time:

Jodi Schneider, et al.: Reducing the Inadvertent Spread of Retracted Science: Shaping a Research and Implementation Agenda
David Barnes (who wrote the fantastic textbook I use for teaching introductory programming in Java) sent me a private link to a demo of a prototype he has on YouTube on his Image Duplication Analyser.
A 2016 paper often quoted yesterday and today by Bik, Casadevall & Fang on image duplication: The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications
Dorothy Bishop's blog entry "Time for publishers to consider the rights of readers as well as authors" really pings with me, I keep pleading with people to understand publishing as communication between writers and readers and to quit with the write-only publications no one reads.
Jana Christopher's "Systematic fabrication of scientific images revealed" in FEBS letters
The New England Journal of Medicine article with the citation analysis of the 5-sentence 1980 letter many people cite and a "study" proving that opiods are not addicting that I referred to in my talk.
The ICAI International Academic Integrity Survey that is being conducted, also mentioned in my talk.

And now for a terribly geeky note on my talk. I have been bothered by presenting online with Zoom that I couldn't have my notes. I use Keynote on a Macbook Pro, and it either assumes the second screen is a beamer (and I can't talk it out of it), or I can only present on my laptop. And I either share the laptop or the second screen on the Mac. There has to be a better way! So I googled yesterday. And found this lovely article with the exact solution to my problem: How to use Keynote’s new Play Slideshow in Window feature with videoconferencing services.

I had just upgraded my iPad to a new operating system, so my Mac (Catalina) needed to install some do-hickey. Then all I had to do was: Start Keynote sharing in the window of my laptop, then share only Keynote on Zoom, and klick on the little remote thingy on Keynote on the iPad. I now set the iPad down on my keyboard, and I had the audience on Zoom (and myself to make sure I'm still in the camera view when speaking) on my second screen behind the laptop, my slides on the laptop screen, and on the iPad I selected the presentation of my notes and the next slide! How utterly perfect! I just needed to tap anywhere on the iPad to advance the slide. If I needed to go back, I could tap on the slide number and it would open up a long string of slides for me to choose how far back I wanted to go. It felt so good being in complete control, although I didn't have any brain cells left to read the chat, as I normally do when presenting. I'll learn once I can relax that this really does work. So thank you Glenn Fleishman from MacWorld!

Copy, Shake, and Paste

Thursday, March 25, 2021

Computational Research Integrity - Day 2

1 comment:

Search This Blog