Copy, Shake, and Paste: March 2021

Thursday, March 25, 2021

Computational Research Integrity - Day 3

Uff. All-day Zoom is tough. We are now on the last day of the Computational Research Integrity Conference 2021. More exciting talks to come!
[This ended up to be a mess of links, many apologies. But they are good links, so I am focusing more on documenting them instead of summarizing the talks. I probably have some links in twice, sorry about that.]

Day 1 - Day 2 - Day 3: 25 March 2021

First up is Boris Barbour of the PubPeer Foundation, which runs the world's most extensive online journal club. He gave a good overview of the system and some fun statistics about the number of comments. He noted that apparently due to the anonymity of PubPeer (it used to be partial anonymity, the staff knew who was writing, but now it is total anonymity), they have many more comments then PubMed Comments. He quoted Max Planck "Science advances one funeral at a time" [Wikiquote: "Eine neue wissenschaftliche Wahrheit pflegt sich nicht in der Weise durchzusetzen, daß ihre Gegner überzeugt werden und sich als belehrt erklären, sondern vielmehr dadurch, daß ihre Gegner allmählich aussterben und daß die heranwachsende Generation von vornherein mit der Wahrheit vertraut gemacht ist." - Wissenschaftliche Selbstbiographie, Johann Ambrosius Barth Verlag, Leipzig, 1948, S. 22] A sister site: Peeriodicals.
Walter Scheirer (University of Notre Dame) "Understanding the Provenance of Visual Disinformation Targeting Science" started off with anit-vaxxers misusing Memes and then took us on a long hunt through the internet to find a surprising source for one of the current stupid memes of bat soup. He then showed how they are trying to find the source of images (of course, using graph theory, the Swiss Army Knife for computer scientists). What a fantastic idea! He published A Pandemic of Bad Science.
Mario Biagioli (UCLA) "Ignorance or mimicry? Lessons from the merchants of doubt." Mario discussed a sinister development of ethical sounding cover ("transparency", "conflict of interest") for nefarious purposes. I learned a new word: "Agnotology", the organized production and distribution of ignorance. Links:
NY Times piece by Lisa Friedman about the E.P.A - A book Merchants of doubt talks about this problem in the tobacco industry, global warming, etc. (reviewed 2010 in The Guardian) - Video of testimony to the House Science, Space, and Technology committee "Strengthening Transparency or Silencing Science? The Future of Science in EPA Rulemaking" - Reasonable Versus Unreasonable Doubt -
Cargo Cult Science - Dorothy Bishop blogged about why she thought serious scientists should NOT attend meetings like this NAS one. - The Guardian: How the truth gets lost (1 Jan 2020) - How to tell the difference between merchants of doubt and those who genuinely disagree? Qualifications can be good, but we need more disclosure about who these people are. People have been coached to look and act like a thoughtful scientist. Mario published a text on Gaming Metrics in 2020.
Thorsten Beck (HEADT Centre - Humboldt University of Berlin) "Image Manipulation Detection — From Visual Inspection to Technology Driven Procedures?"
HEADT Image Integrity Dataset (but you can't see high-resolution images or download them for legal reasons) - Workshop on data visualization - Synthetic data sets for use in training (Paper) - Ansari & Tyagi (2014): Pixel-Based Image Forgery Detection: A Review - Botched Steve McCurry Print Leads to Photoshop Scandal - Bronx documentation center: Altered images - What is a picture? The temptation of image manipulation (2004).
Yury Kashnitsky (Elsevier) spoke on "How near-duplicate detection improves editors' and authors' publishing experience". It seems to me that Elsevier has re-invented eTBLAST (now HelioBLAST) not just for published papers but for submissions so that they can identify simultaneous submissions and re-submissions.
Ivan Oransky (Retraction Watch) "From Cancer to COVID-19, Does Science Self-Correct?" He was invited to review Covid-19 papers (!), apparently on the basis of algorithmic recommendations. He just recently published an article about Covid-19 retractions and one in 2020 about the Covid-19 science meltdown. MDPI recently invited Jeffrey Beall to guest edit a pharmacy journal - Here's a poster paper Jodi Schneider did with a student for SIGMET about really obvious data quality problems with several databases - On the topic of self-retraction, [someone] had a few examples here where people had actually boosted their reputation by doing this - people expect it will be terrible, but others are impressed at the integrity it displays- Dorothy Bishop: Fallibility in Science (2018) - Important list of retracted Covid-19 papers - Chris Graf "At Wiley we have an escalating scale: Corrections < Notes/notifications (which are new, for when there's community 'interest' but no conclusive finding, yet) < Expression of concern < Retraction (with a notice, linked, watermarked, content retained) < Withdrawal (with a notice, content removed). All have DOIs."
Panel 5: Journalists
Ivan Oransky (Retraction Watch), Richard Van Noorden (Nature), Stephanie Lee (Buzzfeed News) - Here is a good Ed Yong Tweet on the relationship between journalists and scientists - Talking of mystery novels, Henry Forman is an academic researcher, now retired, who has a second career as a novelist. One of his books is all about a case of scientific misconduct that turns murderous ... - Two articles by Stephanie: Those Studies About Pasta Being Good For You? Some Are Paid For By Barilla. (2018) - An Elite Group Of Scientists Tried To Warn Trump Against Lockdowns In March (2020). Collaboration between Zotero and Retraction Watch (2019).
Panel 6: Investigators / Whistleblowers
Paul Brookes (Panel chair) University of Rochester, Elisabeth Bik, Boris Barbour, Erica Boxheimer (EMBO Press), Jana Christopher (FEBS Press; Image-Integrity). There was a very lively discussion in the chat about the purpose of retractions, the length of time they take (if anything happens), and how allegations are proven. The links were flying fast, so here's what I managed to grab and the tabs I still had open at the end of the conference: Referee3 - COPE flowchart on image manipulation - A university went to great lengths to block the release of information about a trial gone wrong. A reporter fought them and revealed the truth. (2018, from a story told by Ivan) - Challenges in irreproducible research (2018) - Many scientists citing two scandalous COVID-19 papers ignore their retractions (2021) - Appreciating data: warts, wrinkles and all (2006) - What is Recklessness in Scientific Research?: The Frank Sauer Case (2017) - Hidden Data: The Blind Eye of Science (book by Helene Z. Hill) -
I was sued for libel under an unjust law (2012) - Courts refuse scientists' bids to prevent retractions (2015) - Survival bias (2019) - Scientists Make Mistakes. I Made a Big One (2020) - Innovations in scholarly communication (Bianca Kramer's tool overview site) - Embassy of science.
Daniel Acuna then chaired a 56 person brainstorming session about what voices are missing from the discussion. And it worked! Susan Garfinkel noted that the institutions (RIOs) and their processes for research misconduct investigations are missing. Other voices: Information exchange between reviewers. In biomedical research: patients! Non-journal publishers. Money!!! Physicians and professional organizations (impact on treatments). Federal regulations. Insights from the fakers - The Mind of a Con Man (2013) - Faking science A True Story of Academic Fraud Diederik Stapel Translated by Nicholas J. L. Brown (2014). Legal barriers to sharing of sensitive information between stakeholders. A sense that some of these tools are being used for scientists, to help them avoid errors. What if the institution is not responding? Cultural influences (gift authorship being regarded as positive). International perspective. Different types of misconduct in the various scientific areas. Clearer COPE guidelines for corrections/retractions/expressions of concern. COPE representative. Usability testing between tool developers and users. Someone from MOST (Chinese Ministry of Science and Technology)
CRICONF attendees may be interested to attend the WCRI2021 digital event webinars from 30 May – 2 June 2021. Free registration and more information is available at https://wcri2022.org/digital-event-2021/.

Over and out, I need some sleep!

Computational Research Integrity - Day 2

All right! I slept in this morning to try and have my body be in New York time and not Berlin time. Looking forward to the talks today, I will be on second after Elisabeth Bik. I changed my slides about 17 times yesterday to adapt to the discussions, it's about time I give the talk.

Day 1 - Day 2, 24 March 2021 - Day 3

Elisabeth Bik, the human image duplication spotter, gave us some great stories: How she got started on this (a plagiarism of her own work), what tools she uses, what tools she wishes she had, and even gave us some images to try and spot ourselves. On her Twitter feed (@MicrobiomDigest) she runs an #imageforensics contest. I'm ususally too slow to respond to them. What really puzzles me is: Why are people messing with the images? Why not do the experiments for real? Or if you must fake, use a different picture? We just need to let her get her hands on Ed Delp's tool! That would bring her superpowers up to warp speed!
I was up next with "Responsible Use of Support Tools for Plagiarism Detection", Elizabeth did a great tweet thread on the talk, thanks! I referred to Miguel Roig's work on self-plagiarism in response to a discussion yesterday. Here's our paper on the test of support tools for plagiarism detection and our web-page with all the gory details. And of course, the similarity-texter, a tool for comparing two texts. Sofia Kalaidopoulou implemented it as her bachelor's thesis. It is free, works in your browser, and nicely colors same text so the differences jump out and hit you in the eye.
Michael Lauer from the National Institute of Health then spoke about "Roles and Responsibilities for Promoting Research Integrity." He fired off a firework of misconduct cases that had to do with things like exfiltrating knowledge and research to China or misusing NIH funds with which I couldn't keep up. Some of the schemes were really brazen! A few that I got noted: The Darsee Affair in the 1980s (Article in the New England Journal of Medicine) - an internal peer-review tampering case - Duke University affair around Anil Potti - Chinese Researcher Sentenced for Making False Statements to Federal Agents. Espionage seems to be a really big problem!
Matt Turek Information Innovation Office (I2O), Program Manager at DARPA, spoke on "Challenges and Approaches to Media Integrity." He calmly and matter-of-factly presented some absolutely TERRIFYING, bleeding-edge research on image generation. We had seen some things Ed Delp spoke about yesterday. But things like a Deepfakes video of Richard Nixon appearing to read a speech that was written in case the moon shot (the Apollo 11 mission, I watched this in black and white on my grandmother's TV) ended in tragedy makes me despair that we will ever manage to deal with fake news. Nixon's lips move to the text he is reading, it is almost impossible to tell that this is a fake - except that I know that I saw a different ending in my youth. Matt ended with the possibility of "Identity Attacks as a Service", that is, ransomware that threatens to publish real-looking videos of someone unless they pay up. I'm glad his time was up, afraid that he would have more deeply unsettling things to show. Much as I personally do not agree with a lot that the military is wasting money on, this seems to be a good investment.
Zubair Afzal spoke on "Improving reproducibility by automating key resource tables", I have no idea what key resource tables are, but it seemed to be useful to biomedical researchers.
Colby Vorland, with "Semi-automated Screening for Improbable Randomization in PDFs", attempted to see if data makes sense by looking at the distribution of p values, which should be random. (Note from Elisabeth Bik: See e.g. Carlisle's work on p values in 5,000 RCTs). He has to go to enormous trouble to scrape table data out of PDFs. I suggest using Abbyy FineReader, which does a good job of OCRing tables. Why, oh why do PDFs not have semantic markup?
Panel 3: Funders
Benyamin Margolis (ORI), Wenda Bauchspies (NSF), Michael Lauer (NIH), and Matt Turek (DARPA) discussed various aspects of the funding of research integrity research. All sorts of topics were addressed with the links flying in the chat as usual:
Report Fraud, Waste, Abuse, or Whistleblower Reprisal to the NSF OIG - A link to help PIs prepare to teach or learn more about RCR. - NIH Policy for Data Management and Sharing - Deep Nostalgia - The Heilmeier Catechism - Find US government funding - Build and Broaden for encouraging diversity, equity and inclusion - DORA. The tabs I still have open probably came from this session, they are in the bullet list below.
Daniel Acuna and Benyamin Margolis introduced a competition: Artificial Intelligence for Computational Research Integrity. ORI is offering a grant (ORIIR200062: Large-scale High-Quality Labeled Datasets and Competitions to Advance Artificial Intelligence for Computational Research Integrity) for running the competition.
Panel 4: Tool Developers
Daniel Acuna (Syracuse University), Jennifer Byrne (University of Sydney), James Heathers (Cipher Skin), and Amit K. Roy-Chowdhury (UC Riverside) were discussing.
Jennifer and Cyril Labbé have published their protocol for using Seek & Blastn at protocols.io. And they have a paper on biomedical journal responses that closely mirrors my own experiences.
James talked about his four projects GRIM (Preprint), SPRITE (Preprint), DEBIT, and RIVETS. His statistical work should scare the daylights out of data fabricators. As he points out: by the time they falsify their data to fit the statistical models, they might as well have done the experiments.
Amit spoke a bit more in depth about the work Ghazal presented yesterday and the challenges involved in developing an image analysis tool.
Daniel talked about Dr. Figures (Preprint)
Someone (I didn't catch who, James?) said "Death to PDF!" Indeed, or rather, it needs to be easily parseable so that we can easily mine metadata, get the text and images separated, etc. Cyril posted a link to a good PDF extractor in the chat, I shall look into this very soon.

Links to things in tabs I still have open that someone put in the chat at some time:

Jodi Schneider, et al.: Reducing the Inadvertent Spread of Retracted Science: Shaping a Research and Implementation Agenda
David Barnes (who wrote the fantastic textbook I use for teaching introductory programming in Java) sent me a private link to a demo of a prototype he has on YouTube on his Image Duplication Analyser.
A 2016 paper often quoted yesterday and today by Bik, Casadevall & Fang on image duplication: The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications
Dorothy Bishop's blog entry "Time for publishers to consider the rights of readers as well as authors" really pings with me, I keep pleading with people to understand publishing as communication between writers and readers and to quit with the write-only publications no one reads.
Jana Christopher's "Systematic fabrication of scientific images revealed" in FEBS letters
The New England Journal of Medicine article with the citation analysis of the 5-sentence 1980 letter many people cite and a "study" proving that opiods are not addicting that I referred to in my talk.
The ICAI International Academic Integrity Survey that is being conducted, also mentioned in my talk.

And now for a terribly geeky note on my talk. I have been bothered by presenting online with Zoom that I couldn't have my notes. I use Keynote on a Macbook Pro, and it either assumes the second screen is a beamer (and I can't talk it out of it), or I can only present on my laptop. And I either share the laptop or the second screen on the Mac. There has to be a better way! So I googled yesterday. And found this lovely article with the exact solution to my problem: How to use Keynote’s new Play Slideshow in Window feature with videoconferencing services.

I had just upgraded my iPad to a new operating system, so my Mac (Catalina) needed to install some do-hickey. Then all I had to do was: Start Keynote sharing in the window of my laptop, then share only Keynote on Zoom, and klick on the little remote thingy on Keynote on the iPad. I now set the iPad down on my keyboard, and I had the audience on Zoom (and myself to make sure I'm still in the camera view when speaking) on my second screen behind the laptop, my slides on the laptop screen, and on the iPad I selected the presentation of my notes and the next slide! How utterly perfect! I just needed to tap anywhere on the iPad to advance the slide. If I needed to go back, I could tap on the slide number and it would open up a long string of slides for me to choose how far back I wanted to go. It felt so good being in complete control, although I didn't have any brain cells left to read the chat, as I normally do when presenting. I'll learn once I can relax that this really does work. So thank you Glenn Fleishman from MacWorld!

Wednesday, March 24, 2021

Computational Research Integrity 2021 - Day 1

This week I am attending (and presenting at) the Computational Research Integrity Conference 2021 that is sponsored be the US Office of Research Integrity. I will try and record the highlights here.

The purpose of this conference is to bring computer scientists together with RIOs (Research Integrity Officers) so that a good discussion and exchange about tools for dealing with research integrity issues. The conference was organized by Daniel Acuna from Syracuse University.

Day 1: 23 March 2021 - Day 2 - Day 3

Ranjini Ambalavanar, the Acting Director of the Division of Investigative Oversight at ORI kicked off the conference explaining the workflow at ORI from allegation to decision. It takes a long time, and there are many things to think about, from saving a copy of perhaps a terabyte of data to discovering small discrepancies. She showed us a few cases of really blatant fabrication of data and falsification of images, some of which can be found with simple statistical analysis or image investigation. ORI has a list with some forensic tools they use to produce evidence for their cases. She pleaded with computer scientists to produce more and better tools.
Jennifer Byrne presented some research that she is doing with Cyril Labbé on gene sequences that are used in cancer research. They found numerous papers that said they were using specific genes for some purpose, but that they were actually not using them correctly or were stating that they were using one gene but actually using another. Genes are expressed with long strings of characters representing the bases involved. These sequences are not easily human-understandable, but are easy to find in publications. They have a tool, "Seek / Blastn" that looks for nucleotide sequences in PDFs and querys the sequence against the human genomic + transcript database to output a human-readable name for the sequences that help show up problems.
Lauren Quakenbush & Corinna Raimundo are RIOs from Northwestern University. They train young researchers with RI bootcamps and gave us some good insights into how research misconduct investigations are done for serious deviations at a university in the USA. They have many new issues that are arising: an increasing volume of data that needs to be sequestered (terabytes!), unrealistic time frames, measures to protect the identity of whistleblowerd, determining responsibility and intent, co-authors who are at other institutions, respondents who leave the university, the litigous nature of the cases, communication with journals, and so on. Germany really needs to see that they need staff and resources and not just name a lone RIO....
Kyle Siler gave a short presentation about predatory publishing. He began making it clear that predatory publishing is not a binary attribute, but quite a spectrum of problematic publishing. He spoke of some fascinating investigations that he is doing in trying to identify what is meant by a predatory publisher. He scraped a large database of metadata from various publishers and is trying to measure some things like time-to-publish and number of articles published per year. His slides flew by so fast and I was so engrossed that I forgot to take any snapshots. He found one very strange oddity while cleaning his data: a presumably predatory journal that scraped an article from a reputable journal with Global North authors, and reprinted it. BUT: they made some odd formating mistakes and some VERY odd substitutions (like the first name "Nancy" becoming "urban center"). He assumes that the journal is trying to build up an image of looking respectable in order to gain APC-paying customers. Some are even back-dated, so that the true publication looks like a duplicate publication, or even a plagiarism.
Edward J. Delp described a tool for image forensics that he is developing with a large research group at Purdue + other governmental organizations, in particular with Wanda Jones from ORI. His Scientific Integrity System seems to be just what many of the RIOs need, they wanted to know when he will be releasing the system! The problem is that it can probably only be used for people working for the US government, not for real cases, for legal reasons apparently involving the US military. But he has a user manual online: https://skynet.ecn.purdue.edu/~ace/si/manual/user-manual-scientific-integrity-v5.pdf and a demo video: https://skynet.ecn.purdue.edu/~ace/si/video/sci-int-system-demo_v5.mp4. He uses Generative Adversarial Networks to produce synthetic data for training his neural networks. They use retracted papers with images and non-retracted ones for populating their database.
David Barnes noted that getting annotations off of PDFs is not easy, Ed replied that it indeed hard, but his group knows how to do it!
Update 2021-03-26: Wanda wrote to me to make it clear that it is of course an entire team of people at ORI and NIH who are working with Ed on this project. She also notes:
"the reason we’re not using it on active ORI cases is because of evidence integrity standards, and a federal computing requirement that we operate within the HHS Cloud environment with anything involving personally identifiable information (PII). New systems must undergo rigorous testing before they can “go live” in our internal environment. (Even commercial products must be reviewed, though it’s not as arduous as a newly-developed product.) Purdue hosts the system in its own secure cloud, but we cannot put information that might identify anyone named in our active cases into a non-HHS system. We have full freedom to develop what we need, though, using the thousands of published/retracted PDFs and other file formats that Ed and his team have assembled, including a growing library of GANS-generated images. We couldn’t be more excited about where this is going, and we’re hopeful we can go live in the next year or so. We’re exploring how best to do that.

Further, we’re not restricted because of any military uses of the technology – everything being used with it is published and/or open-source and has been, for years. We’re merely benefitting from DARPA’s years of investment in technology (albeit for national security purposes) that clearly has other worthwhile uses. After all, DARPA gave us the internet (for better or worse)! "
Thank you, Wanda, for clearing this up!
Panel 1: We segued right into the first panel with institutional investigators. [Note to the conference organizers: More and longer breaks needed!] Wanda Jones, the deputy director of ORI moderated the session with William C. Trenkle from the US Department of Agriculture, Wouter Vandevelde from KU Leuven in Belgium, and Mary Walsh from Harvard Medical School. They again picked up on the problem of an overwhelming amount of data and file management they have to do. The panel members briefly presented the processes at their institutions. Will noted that there is no government-wide definition of scientific integrity, although I am very pessimistic on any government deciding on a definition of anything. I was impressed that they made it clear that any analyses are only done on copies, never the original data itself, and that the tools that they use only detect problems, they do not decide if academic misconduct has taken place. A lively discussion raged on in the chat, with Dorothy Bishop noting that the young researchers are the ones who come to research with high ideals and get corrupted as they work. Will noted that agriculture integrity issues are different from medical ones, stating that it is a bit more difficult to fake 2000-pound cows than mice. Ed offered to generate an image of one for him, I really want to see that! It was made clear that there has to be some senior person, be it an older academic or a vice president of research, that protects the RIOs when they are investigating cases, particularly if "cash cows" of the university are under suspicion. [Will got us started on cows, I wonder how many there will be tomorrow!] I asked what one thing the panelists would wish for if a fairy godmother was to come along and grant that wish. The wishes were for a one-stop image forensic tool, more resources, and the desire for people who commit FFP (Fabrication, Falsification, Plagiarism) to have their hands starting to burn 🙌. The chat started discussing degrees of burn, starting with a light burn for QRP (questionable research practices) and a harder burn for FFP 😂.
Panel 2: We were awarded a 5 minute break before the next round, I made it to the refrigerator for a hunk of cheese. Bernd Pulverer from EMBO Press was moderating the panel with Renee Hoch from PLOS, IJsbrand Jan Aalbersberg from Elsevier and Maria Kowalczuk from Springer Nature. Renee detailed the pre-publication checks that they run in order to siphon off as much problematic material as possible before publication. IJsbrand had some nice statistics about the causes for retractions from Elsevier journals. There are more author-reported scientific errors causing retractions, so this helps make it clear that retractions do not always mean academic misconduct. About 20 % of the retractions are for plagiarism and image manipulation is 10-20 %. Bernd was of the opinion that plagiarism is infrequent, so I was back over at my slides, which I must have changed 17 times during the talks, to include a statement that it is NOT infrequent, just not found. He noted that it costs about $8000 per article published in Nature, because so many are evaluated and rejected. An interesting question from Dorothy Bishop was: What do we do if editors are corrupt? There was much discussion in the chat about how to find an appropriate address to report issues and how journals cooperate with institutions. A number of people want to move to versioning in publishing, something I find abhorrent unless there is a wiki-like way of being able to specify the exact version of an article that you are quoting. IJsbrand had a list of twelve (!) grades of ~~retraction~~ corrective tools ranging from in-line correction to retraction. The fairy godmother was now granting two wishes, which brought out things like a fingerprinting or microattribution tool (it's called a wiki, people, you can see exactly who did what when to the article), a user verification tool (sometimes people are listed as authors who do not know that they are being listed), an iThenticate for images, and so on. It was also noted that once the NIH started suing universities for research misconduct, they started perking up and getting on with research integrity training. Money focuses university administration minds!

There were various interesting links that popped up in the chat on Day 1, I gave up trying to put them in context and just started a bullet list here:

Kyle Siler, Philippe Vincent-Lamarre, Cassidy R. Sugimoto and Vincent Larivière, "The Lacuna Database: Empirical Data to Identify Obscure, Unconventional, Questionable and/or Predatory Journals"
2018 UK Parliamentary report on Research Integrity - I think every country needs one of these, especially Germany! The oral evidence of Dorothy Bishop is great.
A Nature news feature from 23 March 2021: The fight against fake-paper factories that churn out sham science
NIST Digital/Multimedia Scientific Area Committee
Reducing the Inadvertent Spread of Retracted Science - Paper: Reducing the Inadvertent Spread of Retracted Science: Shaping a Research and Implementation Agenda
Preprint: Towards minimum reporting standards for life scientists
bioRxive: Amending published articles: time to rethink retractions and corrections?
Hot topic: The Economics of Reproducibility in Preclinical Research: "An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone." - Cochrane Collaboration used figure of USD170 billion (2019) here: https://www.cochrane.org/news/apply-now-cochrane-reward-prize-reducing-waste-research - This paper from 2014 reported average ~$400k costs per retracted paper, in wasted grant money... https://elifesciences.org/articles/02956 - From https://www.bmj.com/content/308/6924/283 (1994). See also Chalmers, Glasziou http://doi.org/10.1136/bmj.k4645 (2018) and http://doi.org/10.1097/AOG.0b013e3181c3020d (2009) - Elizabeth Gammon has done good work on economics of misconduct (using retracted articles), e.g. Gammon, E., & Franzini, L. (2013). Research misconduct oversight: Defining case costs. Journal of Health Care Finance, 40(2), 75–99. - her related dissertation https://mdsoar.org/handle/11603/4071 - [Jodi Schneider] been doing a scoping review of empirical literature about retracted research - there’s a bibliography (up to April 2020) here: https://infoqualitylab.org/projects/risrs2020/bibliography/ and we’re currently screening items up to Feb 2021. I’d love to share what we’ve found with anybody who wants to look more into what’s known about economics of misconduct (based on studies of retracted papers), email jodi@illinois.edu if you want to discuss!
Renee Hoch mentioned a FORCE11 initiative on research data publication ethics, that’s this: https://www.force11.org/group/research-data-publishing-ethics
Example of folks working on training, in the US - National Center for Professional & Research Ethics https://ethicscenter.csl.illinois.edu

I'm exhausted and heading for bed, looking forward to day two!

Saturday, March 13, 2021

Rector of Turkish university accused of plagiarism

There was an article in the German taz this weekend (13/14 March 2021) about the rector of the Boğaziçi University in Turkey that just briefly mentioned that there have been plagiarism allegations against him. Turns out that Elizabeth Bik already has done a deep dive into the allegations in her Science Integrity Digest blog. She has documented a substantial bit of plagiarism. Note: I didn't develop the similarity texter, that was the work of my student Sofia Kalaidopoulou, adapting and enhancing code published by Dick Grune. It is a great tool for documenting plagiarism, though!

There is a brief article in duvaR, an English-language news site about Turkey, and the Times Higher Education also reported on this in January 2021.

The rector, of course, finds all this slander, stating that it's only about a few missing quotation marks and that citation styles have changed since he wrote his thesis.

Styles may change, but it has been the case for quite some time that you have to make a clear distinction between words by someone else and words from you. Just slamming a reference on the end of a paragraph or putting it in the literature list does not cut it.

Thursday, March 4, 2021

ICAI Annual Conference 2021

There are advantages to the pandemic. Many conferences that I would have been otherwise unable to attend in person are now online, so I acutally can go. I do miss the smalltalk (and the inside information no one would dare tell me in writing), especially over breakfast at the conference hotel or with a glass of wine at dinner. But Zoom we must, so we have to make the best of it.

Luckily, the International Center for Academic Integrity (ICAI) is making the best of Zoom by keeping the chat window open during all of the sessions. There have been a number of very lively discussions going on there! I want to report on the sessions I attended (or watched the video later). During one session I realized that my notes from a conference 11 years ago were actually quite useful for determining when a discussion had begun. That has encouraged me to do a detailed discussion of this conference!

1 March 2021

Amanda Mckenzie and Camilla J. Roberts opened the pre-conference explaining what ICAI is. With over 1000 attendees, there were many first-time participants.
Amanda Mckenzie, Camilla J. Roberts, Valerie Denney, and James Orr (all board members of ICAI) then gave a short overview of what academic integrity entails. The six fundamental values of academic integrity that ICAI defines are a commitment to: honesty, trust, fairness, respect, responsibility, and courage.
Jen Simonds, Maureen O'Brien, Kelly Lockwood, Carissa Pittsenberger, and Christian Moriarty opened the conference with a panel on project COIN, the Consortium for Online Integrity. Some of the points they made was how important it is that we clearly communicate to students what we expect from them.
Since Thomas Lancaster was on parallel with my talk, I watched the video later. Surprise, he was not talking about contract cheating! There were many other people discussing that topic. He talked about a series of modules in STEMM developed at Imperial College London including one on academic integrity. It surprised me that the students in his initial course did not realize that academic integrity was not just something for students, but involved all participants in teaching, learning, and research!
Tomáš Foltýnek and I presented the results of our test of support tools for plagiarism detection, which we published in 2020.
I was fascinated by the talk by Olu Popoola on "Detecting Contract Cheating Using Investigative Linguistics." I have been doing some stylometry myself recent years, and it turns out there is actually a term for one of the strategies VroniPlag Wiki uses for finding potential plagiarism sources in doctoral dissertations: Bibliography forensics. Olu identified 164 linguistic features of text and then boiled these down to 32 that he applied to a corpus he has with 250 student papers and 75 papers known to have been written by ghostwriters. They were split into 500 word chunks and then the question asked: can it be predicted when a paper is written by a student and when it is written by a ghostwriter? Of course, he can't do this 100 % correctly, but he did boil it down to 8 significant components that I was not quick enough to write down but which he kindly has blogged about.
David Ison (with Greer Murphy and Alexis Ramsey-Tobienne) offered a workship about "Assessing Academic Integrity from a Faculty Perspective." They were asking for advice on aspects of the ICAI McCabe Survey 2.0 which is going live soon. This is an attempt to get more current data on academic integrity issues. Don McCabe did many surveys in the past on academic integrity, he passed away in 2016 so I am glad they are continuing this work at ICAI.
I thought there was a W(h)ine Bar this evening, but it is tomorrow, so I had my wine while addressing my overflowing email inbox.

2 March 2021

I started the day off attending the Canadian Consortium Meeting. Since my mother was Canadian, I am officially Canadian as well and wanted to hear their perspective on academic integrity. Wow! There are four regional networks of academic integrity officers and researchers that have formed. They tend to be rather informal, exchanging war stories and best practices, and usually meeting up during the ICAI meetings. What struck me was that when presenting, the speakers would first state the indigeneous peoples upon whose lands their university resides, a so-called land acknowledgement. What a respectful way to make it clear that we realize that the lands used to belong to others, and to keep their memories alive!
David Rettinger & co held a keynote panel on the validation of the McCabe Survey 2.0. They have really gone to lengths to focus on getting the wording of the questions to be unambiguous, and to make them translatable so that this instrument can be offered in many different languages. This will enable a good comparison between countries, I really look forward to the results. I will certainly try to convince my university to a) join ICAI and b) use the survey.
Cath Ellis, Kane Murdoch and Mark Ricksen organized a session on contract cheating detection. They get a prize for dedication to the cause for being online at 4 am local time in order to present their work. They first noted that people don't tend to report contract cheating suspicions, because they think it is difficult to prove anything. They listed some of the whack-a-mole things universities do, such as blocking essay mill sites, trying to outlaw such businesses, making 2D barcode stickers with a link to the academic integrity site to stick over the stickers of the companies in the bathrooms, etc. I learned about a company called Chegg that has apparently become very popular during the pandemic. For about 15 $ a month students have access to many questions and answers, often linked to a textbook. I tried it by idling typing in the textbook I use for my class: And there they were, answers to pretty much all of the questions the textbook asks!
They noted some hints one can use to look for contract cheating: students using processes not taught in class, multiple similar wrong answers, sudden improvement in a student's work, alternative labeling or notation that differs from in-class notation, or identical idiosyncratic answers. What shocked me was that students also upload photos of exam questions, the answers are often back within 6-10 minutes!
So I went to the next session on Chegg with Kelly Ahuna and Loretta Frankovitch. They noted that Chegg does work with faculty to take down intellectual property and to let the universities know if students were using the site to cheat. The company will share with the university what it knows: Name, email, and IP-address of the person posing the question with a time-stamp; name, email, and IP-address of people who viewed the answers; link to the page, etc. etc. My European privacy antennae were bleeping like crazy!
Rick Robinson and Jason Openo spoke aber how reporting academic integrity violations impacts faculty relationships and in particular how this affects faculty on an emotional level.
Tonight was the W(h)ine Bar, so I poured a glass and chatted with some very nice folks!

3 March 2021

Jen Simonds, Mariko L. Carson, Amy Mobley, Sharon Spencer, Wendy Williams-Sumpter, and Jillian Orfeo held a keynote panel on implementing a new academic integrity policy. I realize that American, Canadian, Australian, and UK institutions are miles ahead of many European institutions that I am familiar with in having already developed academic integrity policy and are now *improving* their policies! They have offices with many people who work in academic integrity, cultivating good academic practice in students and faculty alike. We have a lot to learn.
I wanted to know more about this Chegg thing, so I attended Tricia Bertram Gallant and Marilyn Derby's session on "Expanding the Conversation about Chegg". They point out that these sites offer something we don't at university: 24/7 help. Professors and tutors are not available at 3 am (well, most aren't) when the student pulling an all-nighter needs a quick answer to a question. And once you are in, you find more and more easy answers. They had some ideas for dealing with the rapid turn-around on Chegg like splitting an exam into two parts time-wise, or even not letting students go back to previous questions. I do find this pedagogically questionable, however. I don't want to stress students, I want to find out how much, if anything, they learned in my course. Tricia has started a discusson on 24/7 help on the ICAI Blog.
Jennifer Lawrence and Kylie Day from the University of New England in Australia organized a session about "Walking the Line Between Academic Integrity and Privacy with Online Exams." I was very interested in this topic, as I just don't see why we should have the power to make students show us their rooms in a 360° pan, let us listen in to what they are doing and record a video of them for the full time of the exam. Since they are primarily an online university, they say that the students know that this will be the way the exams are proctored when they join the school. And since their students are all over the world, they feel they need this. There is a proctor paid for at the company they use who is responsible for watching 8-10 students and jumping in when the monitorings system detects "suspicious behavior" or something. This is such a "1984" scenario that they are getting students used to, I find it highly unethical even if it is useful to the university. I find it troubling how easily the students seem to accept this proctoring online (according to the presenters).
Bob Ives spoke on "Applying the Hofstede Model of National Culture to Academic Integrity". Hofstede defined six cultural dimensions: Power Distance, Individidualism, Masculinity, Uncertainty Avoidance, Long-Term Orientation, Indulgence. If you want to see how different countries are seen on this scale, there's a web site for it. He looked at which dimensions could predict whether students are open to using paper mills, asking other students for help, or asking friends/family for help. I came in late, so I didn't catch how large his data collection was or how he sampled the data.
And then there was another networking session, mostly discussing: Chegg!

4 March 2021

I started off the day attending the COIN discussion about technology in online instruction. I was completely shocked to understand how deeply US institutions distrust their students! They call it "being fair", but they want to use text-matching software on *everything* a student does, including what they contribute to a discussion board! And many permit the students to test a draft of their papers before they submit. That goes strongly against my understanding of what an education is about! We should be trusting our students, seeing them as beginners that we need to train - and to respect! Making people worry about "accidently" plagiarizing instead of teaching them how to write scientifically seems problematic to me.
The keynote was on Stories of Resilience and Academic Integrity during the Pandemic, with a number of participants on the panel. Students Hiranniya Yogaratnarajah and Dennis Tzavaras gave a great perspective from the studnets' point of view! Hiranniya said that students realized that professors where humans too with kids and cats and what-not. They also organized a discussion with professors about academic integrity when they attended college! That must have been fun. Dennis noted that he and his fellow students are so grateful to the professors for making emergency remote instruction possible despite the pandemic. The important thing for everyone is to learn to ask for help if you need it.
I asked about what we need to do more of, what to do less of? They answered that many students want to know how to say no to friends when they ask for help in cheating. They noted that so many students are scared to talk to their professors for some reason. We need to open up that channel of communication and include students in the academic integrity conversation.
Fiona O'Riordan and Gillian Lake from Dublin City University spoke about an academic integrity awareness compaign organized across their university. Their slides are publicly available at http://bit.ly/ICAI-4March2021.
Zachary Dixon spoke about "Triangulating Academic Misconduct Online," digital mutations of classic misconduct. Advantages of digital misconduct: digital analysis methods can be applied. They have a system, CourseVillain, that crawls online coursework sites to find university contect and to auto-populate a "copyright infringement" forms! It's still half web application, half desktop application so it is not ready for prime time. But it is doable! They analysed how many artefacts could be found for a number of classes and found a significant amout of artefacts available onine.
Even though I know the journal and all of the people on the editorial board, I attended the session on "Publishing Your Academic Integrity Research: Advice From the Editorial Board of the International Journal for Educational Integrity": Tomáš Foltýnek, Zeenath Khan, Thomas Lancaster, Ann Rogerson, and Sarah Elaine Eaton. I posed the nasty question of why they are with SpringerNature and charging 800 € APC. The authors are paid for by their institutions, the peer reviewers, the editorial board. SpringerNature has its corporate offices in Berlin, but sits in Luxemburg for tax purposes and thus only pays a minimum tax on its earnings. None of that tax feeds back into the institutions that pay us. Scholarly publishing is broken and needs fixed, fast. I realize that one single journal can't fix the problem, but I find it important to point out this problem.
I had to attend Muberra Sahin, Abigail Pfeiffer, Carissa Pittsenberger, and Jessi Bullock's session on "Identifying Authenticity Issues in Student Papers Using a Plagiarism Checker" of course! They broke out into 4 sub sessions, I joined Muberra Sahin who showed us how they deal with paraphrased papers that are not directly findable with the system they use at their university. They hand-color (!) similar text, so I showed them the similarity checker that a student of mine made that might make their life easier.

And then the conference was over - I met some new people, learned a lot of new stuff, had some great discussions: Thanks to the organizers for getting such a great virtual conference off the ground!

Monday, March 1, 2021

Cleaning up my browser tabs

The second Corona semester has now come to an end and I seem to have about 200 tabs open in various browsers. Some of the tabs concern interesting plagiarism and academic integrity questions, so here's just a brief list:

The Minister for Education and Science in the Ukraine was being investigated for plagiarism (German) in his doctoral disseration as of July 2020. He is apparently also the rector of the university of technology in Tschernihi.
The HEADT Centre at Humboldt University, Berlin has a series of recorded seminars on plagiarism, image manipulation, authorship, and content ownership, sponsored by Elsevier.
An arXiv preprint "Forms of Plagiarism in Digital Mathematical Libraries" of a conference presentation at the Intelligent Computer Mathematics - 12th International Conference, CICM 2019, Prague, Czech Republic, July 8-12, 2019
Michael V. Dougherty published a book in 2020 on "Disguised Academic Plagiarism - A Typology and Case Studies for Researcher and Editors". (Conflict of interest: I reviewed this book for Springer)
There was quite a spat over the diploma thesis and dissertation of an Austrian minister who stepped down over the incident. This led to many publications around the topic, for example one about degree mills at Der Standard (in German).
Simone Belli (Spain), Cristian López Raventós (Mexico), and Teresa Guarda (Ecuador) published a paper "Plagiarism Detection in the Classroom: Honesty and Trust Through the Urkund and Turnitin Software" in the Proceedings of ICITS 2020. Of course, I find it very problematic to be using the numbers returned by Turnitin and Urkund as the basis of judging anything. The numbers are meaningless and do NOT give a percentage of plagiarism but an indication of text similarity. They are NOT the same thing. They write: "Thanks to these programs, teachers have a powerful tool to assess the level of honesty of students. [...] Thanks to this tool, the teacher can easily justify a bad grade that shows the percentage of plagiarism in the work presented by the student. At the same time, it saves time spent reviewing a text that is not evaluable due to its illegitimate origin." This is wrong on so many levels, I will be talking about this on March 24, 2020, at the conference sponsered by the Office of Research Integrity.
My university, HTW Berlin, now has ethical guidelines for research! (in German)
A court in Berlin has decided that a Berlin university was correct in exmatriculating a student for plagiarism. Berlin universities have a policy of "two strikes and you are out", if a student is found plagiarizing twice, they are exmatriculated. In this case, the student was found to be plagiarizing once in his Bachelor's program and once in his Master's program. He felt that he should be "allowed" on plagiarism in each program, the university insisted that the programs are consecutive, and thus he is out. Need I mention that the student was studying .... ethics and philosophy?

Copy, Shake, and Paste