Thursday, May 26, 2022

ECAIP 2022, Day 3

And now to conclude the conference! Unfortunately, I had THREE things booked for today, the conference and two others. Luckily, a friend took over one and I was able to do one via Zoom (having to get up early, as the 8:30 meeting started at 7:30 local time). I was able to use the internet at the Porto university, and then join up with the conference a little later.

Day 1 - Day 2 - Day 3


I missed the keynote by Elisabeth Bik, "The Dark Side of Science: Misconduct in Biomedical Research". I've heard her a number of times, her talks are always fascinating (and scary when you see how much research misconduct she is uncovering - imagine how much more is out there that she does not see.

The next talk I attended was by Thomas Lancaster on "Artificial Intelligence Led Threats To Academic Integrity." He casually demonstrated text generators, even a scientific literature review generator and an image and slide generator. With all of this out there (and used and the papers accepted by predatory publishers) we need to see that there is a big threat to scientific integrity out there for which we have absolutely no idea what to do about it.

Suraj Ajit (University of Northhampton) was to be speaking on "A rule-based decision support system for detecting, reporting, and substantiating contract cheating within assignments in computing courses in UK Higher Education" with an emphasis on computer science assignments. However, he used most of his time to tell us about the processes at his university for dealing with academic misconduct. At an academic integrity conference (as opposed to an introductory course for teacher) one can assume that people know about this. So he didn't have time to actually speak about his work, other than flashing a few decision tables that I didn't have time to read. He has unfortunately not posted his slides on the Sched site for the conference, so I was unable to read them later. 

Rafael Ball (Director of the ETH Library and Collections) then spoke on "Awareness Mentality and Strategic Behaviour in Scientific Publishing and Dissemination." He bemoaned a percieved shift in academic behavior from "being good" to "looking good" in bibliometrics and altmetrics. The awareness mentality deals with the strategic behavior of scientists and publishers: Scientists focus on career building, awarding, funding, while publishers focus on competition, having high rejection rates, and of course high spectacularity. He has noted that more and more article titles are ending with a question mark :) External goals are pre-empting scientific goals. He asks if publishing a translation of a paper is self-plagiarism, or if publishing with a slight shift in focus is an unnecessary second publication? 

Beatriz Antonieta Moya and Alex Paquette (University of Calgary) spoke about "Graduate students' reflections as partners of academic integrity advocacy during Covid-19" (Slides). In the past, grad students were not part of the Academic Integrity Week, but their mentor Sarah Elaine Eaton got them interested. They have a number of things they started like academic integrity trivia quizzes each day on Instagram or live session AMA with Sarah. 

Kelley Packalen (Queen's University) spoke on "What’s the Harm? The Professor Will Never Know: Understanding How Students Justify Participating in the “Grey Areas” of Academic Integrity." She and Kate Rowbotham looked at three research questions:

  1. Under which scenarios do students determine it is permissible to engage in specific trivial and common violations of academic integrity? 
  2. Is there a slippery slope effect as related to violations of academic integrity? 
  3. What explanations do students use to justify violating academic integrity in general?

They determined that there is a slippery slope effect, and deduce the following practical implications that are similar to what is used to discourage students from binge drinking:

  1. Share your thinking with students.
  2. Debunk myths that everyone is cheating.
  3. Reframe the choice as a moral decision instead of a business decision.

Erja Moore closed the session with "Internationalisation of higher education in Finland – A challenge for integrity in academic writing at Master’s level." She chose 28 Finnish Master's theses in English, 1 % of the 2020 Master's theses and 15 % of the master's theses in 2020 and gave them a close read. They varied in length from 23-101 pages, the reference lists were between 2 and 11 pages. She found a lot of problems in the theses: no written methodology, no in-text references, pseudoreferences, invented sources, inappropriate sources, etc. 

The conference closed with a keynote by Teddi Fishman "How we Succeed? Goals, Metrics, and Successes for Academic Integrity Initiatives in a Post-Covid, "PostTruth" World." In order to have some fun, Sonja Bjelobaba inserted random slides into her slide deck. Teddi asked us how we define "success" and in particular if our methods are valid, reliable, attainable and useful. We have to conclude that we are using data that is deeply flawed. We don't know how many students are cheating, we only know what we catch! We are in the post-truth era already (nicely illustrated with a a shop selling "Genuine Fake Watches). She introduced us to the "Overton Window", the idea that leaders are limited to those possibilities that already enjoy popular support. In effect, we have moved from stocks and pillories through shunning and public flogging to restorative justice in many areas. In academic integrity we are stuck in point penalties, revocation of degrees, loss of position or prestege, or demotion. Restorative justice is not quite inside the window yet. A random slide of Teddi in a Flying Spaghetti Monster costume got us off topic a bit, but she steered us back to some food for thought:

  • To what extent have we shown sufficient transparency and accountabilitiy in our research practices so that the public can have faith in our outputs?
  • Students who learn about knowledge production in concert with integrity become researchers with greater appreciation for integrity, who become supervisors of integrity.
  • How do we bring about institutional as well as societal change? From the ground up!   

And then we had the closing ceremony! A large group of (male) medical students in traditional Porto garb and traditional instruments serenaded us and Laura Ribeiro, the organizer of the conference, including some spectacular gymnastics:


The next conference will be in June 2023 in Derby, UK, organized by Shiva Sivasubramaniam under the motto "Reflecting the Past for Reforming the Future".

Thursday, May 19, 2022

ECAIP2022, Day 2

Day 1 - Day 2 - Day 3


And on to the second day of the conference!

The day began with a keynote speech by Ana Marušić, "Challenges in publishing ethics and integrity" Ana is a professor at the University of Split School of Medicine, Croatia,
Co-editor-in-chief, Journal of Global Health, a COPE council member, and President of The Embassy of Good Science Foundation. Coming from the standpoint of a journal editor she discussed research and publishing ethics, noting that there is a spectrum, from honest error over poor reporting to outright fraud (detrimental research practices). She first ran us through the history of journals, from the first journal, Journal des sçavans in 1665. She noted that the concept of peer review didn't appear until the middle of the 18th century, and Nature didn't even start until the 1950s! She had a good SWOT analysis of the challenges that editors face:


The main challenges of today are: dealing with image manipulations, correcting articles with honest errors, sorting out pre-prints, and trying to avoid paper mills and predatory publishers. 

Her summary is: Quality assurance in editing is the key to responsible publishing!

I then attended the workshop on "Coming Clean – Addressing the Issues Where a Student Self Declares Contract Cheating" with Thomas Lancaster, Michael Draper, Sandie Dann, Robin Crockett, and Irene Glendinning. I liked that they explicitly asked for consent to record the session, as they want to have input from the audience. Contract cheating represents choosing the wrong path - what if a student wants to come clean? The information that we provide to students should highlight and detail the whistleblowing processes and the support that is available to them, should they wish to admit to having cheated using contract cheating. Some cheating companies attemt to blackmail the students, say that they will tell the university if they don't pay more for the services used. There was a good discussion about the appropriate level of penalty that should be assessed in this situation. Teddi Fishman made it clear that there should be a path towards amnesty, with a focus on retributive or restorative justice. Others felt that there should be no credit given for contract cheating. Mike Reddy spoke about the 4 Cs: Conscious copying of content/concepts for credit. If they do not submit, it is not a crime. The lack of any one of the Cs is just poor academic practice, he thinks. 

Then we had a plenary session with a panel discussion (sponsored by Turnitin) moderated by Sonja Bjelobaba:

  • Andreas Ohlson (Turnitin, former head of Urkund and Ouriginal, from Sweden)
  • Tomáš Foltýnek (Researcher, ENAI president), from Brünn, in the Czech Republic 
  • Martine Peters (Prevention researcher, professor), from Québec, Canada
  • Katrīna Sproģe (European students union), from Latvia

KS noted that not all students have access to Turnitin. I realize that students want this, because they are afraid of plagiarizing by "mistake", but this won't help. They will rewrite to the number the software spits out (and really, really needs to lose!), but their writing won't be better. She also noted that students often translate texts they find online, and indeed, our 2020 test showed, that translation is in general not found by such systems.

TF (who let the 2020 test effort mentioned above) noted that even German to English cases are being found, so it is not just non-English languages that are being used. He criticized the interface that spits out a number that people take to be the decision. He also noted that with Turnitin buying up/out all the good competition, we are losing the competition. Each system finds different plagiarism, so it was better when we had more choice.

AO asked where the market will be in 5-10 years. He feels the consolidation is important, as it is easier to try new things if you are part of a larger company. The company has lots of discussions and research going on, in particular about how to use text-matching software in education? [The Times Higher Education reported on his comments in detail.]

MP suggested using the tool with the student in front of the student.

TF noted that so many universities have policies that are quite different, and many use the number reported by Turitin as a discriminator. There can be many reasons why there is a lot of text match (for example, the student submitted text earlier), there are tables, illustrations, etc. Not every plagiarism is detectable as a text match. He noted that paraphrasing tools are getting better and better, how do we deal with this?

AO admits that if you look at a single document, it is hard to do. But he notes that Turnitin are looking at the issue. One key is that we could compare with the same student's work over time. So if universities use our solution for everything a student does, they can see when the style changes. [But we WANT the students to change their writing style to become more academic in their writing! -- dww]

MP notes that we often don't even bother teaching referencing at universities, we focus just on the number.

KS wants the playing field to be more level. Students need an understanding of what the teachers expect and the teachers must understand what the students know. Who is the person you are teaching? Why are you teaching this material?

Thomas Lancaster from the audience asked "What answers are most different to the ones being given 10 years ago?"

MP: We as educators did not reflect as much on our role in plagiarism detection and prevention. We just blamed students. Now we look more at our role, it is a rude awakening.

KS: I wasn't a student 10 years ago :)  but we are involved in the discussion now.

TF: There has been a huge shift from the technological point of view to pedagogical approaches. From "What do we do when we discover plagiarism?"  to how to prevent plagiarism.

AO: Percentages :)

And then we had to hurry to the hotel to drop our stuff, and we were off for a bus tour of the city, a port wine tasting, and dinner. Tomáš and Dita had this nice picture of Elisabeth Bik and myself standing with them taken while we were waiting.

Monday, May 16, 2022

ECAIP 2022, Day 1

After two years of virtual conferences (2020, 2021), we are back in person in Porto, Portugal - with the talks of the European Conference on Academic Integrity and Plagiarism being broadcast so that people can attend at a distance. How wonderful it is to see people!! I can't help hugging people I've know for more than a decade and haven't seen in person since 2019 or earlier. And wow, I'm not the only person from Germany here, there is someone from the University of Constance. And wonderful technology: The European wireless network eduroam works seamlessly here! But there are no desks in the auditorium, that makes it a bit hard to organize one's work. I will report on the talks that I attended. 

The book of abstracts and the final program are on the general conference web site. I was goint to post this during the conference, but I ended up with no free time and spent Saturday enjoying Porto and Sunday returning home (with sore calf muscles from all those steep streets). So I will try and get this out as soon as possible.

Day 1 -- Day 2 -- Day 3 


The conference was opened with a keynote speech by Daniele Fanelli, (Fellow, London School of Economics and Political Science) "Research integrity in a complex world". Among other publications, Daniele is the author of "How many scientists fabricate and falsify Research? A systematic review and Meta-Analysis of survey data" from 2009. He dove right into the question of complexity and how we can go about actually measuring complexity. In a nutshell, the more "moving parts" a topic has, the complexer it gets, and the more complex a system is, the more prone it is to questionable research practices (QRP). And the more QRPs, the more there is a possibility of research integrity problems. One of the big questions is the irreproducability crisis, which he tried to boil down to a mathematical formula that I think very few understood. He closed looking at various factors and concluding that we really don't know exactly, but his formula is an attempt to get a handle on it. It was a great start, as discussions of complexity ran throughout the discussions during the rest of the conference. In the first breakout session I attended 3 talks:

  • Patrick Juola from Duquense University in Pittsburgh, PA, USA (and that is pronounced "do-cane", I come from that neck of the woods!) created a controlled test corpus for looking at text overlap by having 91 participants write two short texts: One on how to get from A to B on a map and one on how to make lemonade. He then used the Jaccard similarity coefficient to calculate how many words the text had in common pairwise: 0.0 means no words in common, 1.0 all words. The average similarity was only 0.21 +/- 0.07 for Map and 0.19 +/- 0.07 for Lemon (thus the name of the corpus, MapLemon). We giggled at one of the outliers that described making lemonade without using the words "lemon", "sugar", or "water": The author just wrote: Go to the store and buy it. This provides empirical evidence that even people writing on the same topic will use different words, because they are using their own voice. He want to replicate this for other languages and additional topics. 
  • Tutku Sultan Budak-Özalp from the Canakkale Onsekiz Mart University in Turkey looked at perceptions towards academic integrity in English as a Foreign Language teachers. She interviewed 25 Turkish EFL (we tend to call it ESL, English as a Second Language) instructors in an online survey with 39 questions. Unfortunately, she was presenting at a distance and we hadn't yet figured out how to make the PowerPoint slides larger (Zoom and PowerPoint don't play nicely together), so I was unable to read the results. She said in summary that the teachers are knowledgable about academic integrity, but there is not yet a nationwide policy, although the Higher Education Council in Turkey is working on one.
  • Pegi Pavletić and other students from the European Students' Union presented about the work of students in the area of academic integrity. As Teddi Fishman, the chair of the session noted: There should be "no talks about us without us!" It was great to hear that students are getting active! The ESU is an organization of 45 student unions from 40 countries, in Ireland the National Academic Integrity Network (NAIN) has 13 student members among their 92 and has published guidelines, Croatia is dealing with a number of academic freedom issues. These students want to serve as ombuds, go-betweens between students and the various decision-making bodies. But there are so many different forms of decision making bodies at universities that the concept rather becomes an enigma. 

Lunch was awesome, the medical school in Porto has a dining room with linens, cutlery, and wine glasses laid out. They had three buffets, salad, main course, and dessert, although without signs it was hard to determine what was vegetarian. The lentil salad turned out to have chicken in it, and there was ham mixed into the noodles, but one managed and the food was delicious. I turned down the wine, however, until the last day. Without a siesta after lunch it would be difficult to not fall asleep during the afternoon sessions, and I didn't want that.  

  • Zeenath Reza Khan, Sreejith Balasubramanian, and Ajrina Hysaj spoke on "Using the IEPAR Framework - a workshop to build a culture of integrity in higher education". They have started the Centre for Academic Integrity in the United Arab Republic. IEPAR stands for "Inspiration, Education, Pedagogical Consideration, Assessment Design, Response and Restorative Practice". They are focusing on prevention instead of policing and sanctioning.
  • One of Thomas Lancaster's students, Pundao Lertratkosum, wrote their thesis on "Contract Cheating Marketing in Thailand". The question was whether or not there is marketing going on in non-English-speaking countries for contract cheating. The answer is a sad and resounding: yes! They looked at various social media and found lots of marketing, even videos that try to make it sound normal to use such services. The transactions themselves are then often conducted by messenger, the essays were asked for in Thai, English or Mandarin. The typical advertised price for a 1000-word essay in Thai: $80 to $140 US dollars. There are serious risks to students, who for example post receipts on Twitter, testamonials. They can be reverse-engineered and blackmailed. Thailand closely mirrors other countries with there being offers even for exam impersonation and admissions fraud. Even though this is a localized market (because most of the papers are written in Thai), we need to look to see how we deal with this normalization of cheating.
  • The talk on "Contract Cheating in Lithuania" by Simona Vaškevičiūtė and Eglė Ozolinčiūtė was unfortunately quite short, as Zoom seemed to be misbehaving. We eventually solved the problem: when telling people to "share their slides" they would share only PowerPoint, not their screens. We would only see the last slide they had open, while they saw the current slide on their screens. Orally they spoke about visiting various web sites every day for one month and copying the advertisements found there. In particular they found that the advertising focusses visually on achievements with photos showing people reading, writing, or wearing a cap & gown.
  • The session ended with a shocking talk by Anna Abalkina on "Publication and collaboration anomalies in academic papers originating from a paper mill: evidence from a Russia-based paper mill." She, too, had issues with Zoom, but we managed to get it sorted out. The organization "International Publisher LLC" is no longer just Russia-based, they have clients are all over the world. It published papers online, and then sells additional authorships, first author is more expensive than somewhere in the middle. She looked at 1000 paper offers and found 441 that had been published online.  There were some 800 authors that could be identified from >300 universities in 39 countries. 152 of the journals appeard to be authentic and 3 so-called hijacked journals. In all, more than 6000 co-authorships slots appear to have been sold. She contacted many editors, only to be brushed off with statements like "We have strict peer review!" For an additional price, a city could be inserted into the abstract to localize the paper. Most of the authors were from Russia, but there were also authors from Kazakhstan, China, Ukraine, UAE, Azerbajan, Uzbekistan, UK, Israel, Vietnam, Egypt, Jordan, Spain, ..... Most of the purchased authorships were first authors. She calculates this to bring the company more than 6 million dollars in 3 years. So there are many problems in publishing that have not yet been discovered. The journals' (non)reactions to information about what appears to have happened is a serious challenge for academic integrity.

The next session I attended was more about technical tools for dealing with academic integrity questions.

  • First off was Clare Johnson (working with Ross Davies and Mike Reddy) with her tool Clarify. It can be used to do forensic research on Word documents, as Word stores a lot of metadata in the saved version. There is information about formating, revisions, cropped images and sources and so on that are compressed, as it were, in the document. So this tool decompresses that and looks to see if it appears to have genuinely been written over a period of time, or if just one big copy & paste action put in all the text. She visualizes the text runs, i.e. the text that was inserted at one time. She showed us some examples. I really want to give this a test-drive, but have been unable to find it online. I have written to Clare to ask her if she can please let me have a copy of the tool.
  • Evgeny Finogeev from the Russian company Antiplagiat spoke about "Image reuse detection in large-scale document scientific collection." It was made clear that this paper had been accepted before the war, and that sponsorship money that Antiplagiat had paid for the conference was being donated to a Ukrainian relief charity. They were not allowed to advertise at the conference, only present the academic paper. They took 1.9 million papers from the DOAJ, extracted the images, classified them, vectorized them and used a Siamese neural network to try and identify images that had been reused. The neural network identified 43 000 cases, 4051 of these were checked by hand. Most of them were self-reuse, very little correct re-use.  Possible plagiarism was found in 8 cases, possible falsification in 11, there was permission to use the images in 4, 93 were paper copies, the rest were no problem. I objected to them using the "Lena" image from Playboy for their presentation, they did not seem to understand that we are trying to convince people to no longer use this image
  • Finally, Christopher Nitta (UC Davis) spoke on "Detecting Potential Academic Misconduct in Canvas Quizzes." The learning management system Canvas has an API that can be used, and a Python library (canvasapi) that makes use of the API. The problem is that the lock-down browser doesn't work with Linux, which many students use. [my solution is to devise exams that use the entire internet - after all, at work they can Google... -- dww]
    Their solution tries to identify potential misconduct and highlights these exams for further analysis. How long did they spend away from the quiz? Of course, this could be dealing with a child at home. Formating from copy & pasting is preserved on Canvas, so web links and other formatting are tell-tale signs of misconduct. Large exam time windows permit answers to be shared with others, so the timing of the questions is analyzed for outliers. If students start within seconds of one another on all questions, this could be a sign of collusion. Of course, if they are using secondary devices, this cannot be seen. [See? So just quit with the proctored exams already! -- dww].
    UC Davis had 1415 referrals to the academic integrity office in the 4 terms prior to the pandemic, but 3246 in the first 4 terms of the pandemic. They see this as evidence of increased cheating [I see this as evidence of increased looking! We don't know how much cheating was going on prior to this, only how much cheating we found. -- dww]
    Almost 20 % of the cases were misconduct on Canvas quizzes. The manual review takes much more time. Of course, the false negative rates unknowable, and the false positive rate seems to also be muddled.
    The code is not open source because they don't want the students to figure out how to get around the system. [I wish to remind them of Kerkhoffs's Principle. They are smart, they will figure it out, so make it public anyway! -- dww]
And that concluded the first day! There was a reception in the foyer with some nibbles and a bit of port wine. After dumping our stuff at the hotel, we braved the tram system out to Matosinhos for a grilled sardine dinner. The staff didn't bat an eye at 20 people showing up, and they kept the good wine and food flowing. The only issue was another hour on the tram back to the hotel. For Irene I found this picture of the Man Eating Fish graffiti mural by Mr Dheo next to a Burger King.

Monday, May 9, 2022

President of Peru being investigated for plagiarism

ABCnews reports that the president of Peru is being investigated for plagiarism in his Master's thesis, which he wrote together with his wife. Journalists apparently used a freedom of infomation act application to obtain the thesis, then shoved it through a system that sells its product as a plagiarism detection system. They report, as people are wont to do, on the number reported.

This bears repeating: Software cannot determine plagiarism! The most recent study that I participated in was published in 2020, we sum it up as:

There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. [...] The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

Emphasis added. Quit thinking software will solve this problem!