– Debora Weber-Wulff
Can citation patterns help detect heavily disguised plagiarism in academic documents?
by Bela Gipp
A while back, Retraction Watch, a blog on scientific integrity, reported on five plagiarism cases discovered in Neuroscience Letters. Three cases translated Chinese originals into English, while another translated a French text into English. None of the cases acknowledged they were translations.
Translated plagiarism remains one of the most difficult forms of academic misconduct to detect. Since few researchers actively follow the literature in multiple languages, peer review is unlikely to recognize translated plagiarism. Software is largely useless in helping to identify translated plagiarism, because today’s plagiarism detection systems rely on a minimum amount of text similarity to spark suspicion, yet translations typically contain very low or no textual overlap. When documents use different alphabets, e.g. Chinese, Korean, or Russian characters compared to Latin characters, available detection systems stand no chance.
A new approach for plagiarism detection, termed Citation-based Plagiarism Detection (CbPD) goes beyond literal text similarity to detect potential plagiarism. The citation-based approach examines the in‑text placement of academic citations to form a language and text independent “fingerprint” of semantic similarity. The practicability of this citation-based approach was initially demonstrated in an analysis of the translated plagiarism in the prominent plagiarism case of K.-T. zu Guttenberg. Recently, a group of researchers in cooperation with students from the HTW-Berlin developed the first citation-based plagiarism detection prototype, “CitePlag”.
In the image below, CitePlag visualizes one of the five articles that were retracted from Neuroscience Letters. No textual similarity remains between the two publications, since the plagiarism (left) is a translation into English of the Chinese original (right). The citation-based approach, however, identifies and connects matching citations in a central scrollable column for human inspection. Examine this example for yourself in CitePlag. For more information on the prototype and the algorithms it implements, refer to this publication.
A medical article in Indian Journal of Urology was recently retracted after the CbPD approach identified a notably high citation pattern overlap with a journal article published in another journal two years prior. The citation-based similarities, as well as the text, which the retracted article shared with its source can be examined using the prototype here.