– Debora Weber-Wulff
Can citation patterns help detect heavily disguised plagiarism in academic documents?
by Bela Gipp
A while back, Retraction Watch, a blog on scientific
integrity, reported on five plagiarism cases discovered in Neuroscience Letters. Three cases translated Chinese originals into
English, while another translated a French text into English. None of the cases
acknowledged they were translations.
Translated plagiarism remains one of the most
difficult forms of academic misconduct to detect. Since few researchers
actively follow the literature in multiple languages, peer review is unlikely
to recognize translated plagiarism. Software is largely useless in helping to
identify translated plagiarism, because today’s plagiarism detection systems rely
on a minimum amount of text similarity to spark suspicion, yet translations
typically contain very low or no textual overlap. When documents use different
alphabets, e.g. Chinese, Korean, or Russian characters compared to Latin
characters, available detection systems stand no chance.
A new approach for plagiarism detection, termed
Citation-based Plagiarism
Detection (CbPD) goes beyond literal text similarity to detect potential
plagiarism. The citation-based approach examines the in‑text placement of academic citations to form a language and
text independent “fingerprint” of semantic similarity. The practicability of this
citation-based approach was initially demonstrated in an analysis of the translated
plagiarism in the prominent plagiarism case of K.-T. zu Guttenberg. Recently, a group of
researchers in cooperation with students from the HTW-Berlin developed the first citation-based plagiarism
detection prototype, “CitePlag”.
In the image below, CitePlag visualizes one of the five articles that were retracted
from Neuroscience Letters. No textual
similarity remains between the two publications, since the plagiarism (left) is
a translation into English of the Chinese original (right). The citation-based
approach, however, identifies and connects matching citations in a central scrollable
column for human inspection. Examine this example for yourself in CitePlag. For more information
on the prototype and the algorithms it implements, refer to this publication.
A medical article in Indian
Journal of Urology was recently retracted after the CbPD approach identified a notably high
citation pattern overlap with a journal article published in another journal
two years prior. The citation-based similarities, as well as the text, which
the retracted article shared with its source can be examined using the prototype
here.
Thanks for sharing this! This sort of detection seems like it could be promising in detecting secondary source abuse (using references from a non-cited source) as well as translation. This seems to happen often in student research essays. I'd love to see this tool made available online or as part of an existing suite like Turnitin.
ReplyDelete"secondary source abuse" is not generally accepted as plagiarism at all, for example:
Deletehttp://arxiv.org/pdf/0803.1526.pdf
http://www.uwgb.edu/dutchs/PSEUDOSC/PlagiarNonsense.HTM
http://itre.cis.upenn.edu/~myl/languagelog/archives/004608.html (already posted in the comment in http://copy-shake-paste.blogspot.de/2013/08/another-german-politician.html#comment-form)
Oh, but this is not about secondary source abuse or citation plagiarism, but the use of citation patterns to detect types of plagiarism (translation plagiarism or structural plagiarism) that would otherwise go unnoticed. If an author checks all of the sources obtained from another source, I do not see a problem. The problem is when the statement is taken without checking. If it turns out to be false or non-existent, then one can see that shortcuts have been taken. Otherwise, I don't see a problem.
DeleteI agree that the use of citations which were taken without checking is not the correct way to work, but I don't regard it as plagiarism or cheating in general. The author of the first and third texts linked above seem to agree with the opinion that there is no plagiarism in this case. In my opinion the judgement of cheating depends on the discipline and task. If the task of a thesis lies in the analysis and elaboration of other texts it is in my opinion of course cheating to take references out of secondary sources without checking them, because the desired work has not been done (the recent case of a German politician showed a different point of view in the evaluation of the made charges). If the task of a thesis lies in experimental research I'd regard the use of unread references as a technical flaw.
Delete