– Debora Weber-Wulff
Can citation patterns help detect heavily disguised plagiarism in academic documents?
by Bela Gipp
A while back, Retraction Watch, a blog on scientific
integrity, reported on five plagiarism cases discovered in Neuroscience Letters. Three cases translated Chinese originals into
English, while another translated a French text into English. None of the cases
acknowledged they were translations.
Translated plagiarism remains one of the most
difficult forms of academic misconduct to detect. Since few researchers
actively follow the literature in multiple languages, peer review is unlikely
to recognize translated plagiarism. Software is largely useless in helping to
identify translated plagiarism, because today’s plagiarism detection systems rely
on a minimum amount of text similarity to spark suspicion, yet translations
typically contain very low or no textual overlap. When documents use different
alphabets, e.g. Chinese, Korean, or Russian characters compared to Latin
characters, available detection systems stand no chance.
A new approach for plagiarism detection, termed
Citation-based Plagiarism
Detection (CbPD) goes beyond literal text similarity to detect potential
plagiarism. The citation-based approach examines the in‑text placement of academic citations to form a language and
text independent “fingerprint” of semantic similarity. The practicability of this
citation-based approach was initially demonstrated in an analysis of the translated
plagiarism in the prominent plagiarism case of K.-T. zu Guttenberg. Recently, a group of
researchers in cooperation with students from the HTW-Berlin developed the first citation-based plagiarism
detection prototype, “CitePlag”.
In the image below, CitePlag visualizes one of the five articles that were retracted
from Neuroscience Letters. No textual
similarity remains between the two publications, since the plagiarism (left) is
a translation into English of the Chinese original (right). The citation-based
approach, however, identifies and connects matching citations in a central scrollable
column for human inspection. Examine this example for yourself in CitePlag. For more information
on the prototype and the algorithms it implements, refer to this publication.
A medical article in Indian
Journal of Urology was recently retracted after the CbPD approach identified a notably high
citation pattern overlap with a journal article published in another journal
two years prior. The citation-based similarities, as well as the text, which
the retracted article shared with its source can be examined using the prototype
here.