Systematic Characterizations of Text Similarity in Full Text Biomedical Publications2010 Systematic Characterizations of Text Similarity in Full Text Biomedical Publications. PLoS ONE 5(9): e12704.doi:10.1371/journal.pone.0012704The authors of eTBLAST, a text-similarity search engine, have expanded their work that at first investigated text- and author-similarity on PubMed abstracts (CSP article from 2008). They have now accessed full-text articles to dig deeper into text similarities.
They investigated over 70.000 full papers, and determined that abstract similarity is a good predictor of full text similarity. They caution, however, that the automatic identification of possible cases of plagiarism must be checked by hand to determine if indeed plagiarism is present. They only uncovered 34 highly similar papers, and all were updates or multi-part articles that did indeed share larger sections of text.
However, they note that many of the currently uncovered plagiarized publications, for example in Chile and Peru [1], were translations and these are not included in the PubMed database.
[1] Sources given in the article about the Chilean and Peruvian cases:
- (2010) Characteristics and publication patterns of theses from a Peruvian medical school. Health Info Libr J 27(2): 148–154.
- (2008) [Duplicate publication: a Peruvian case]. Revista de Gastroenterologia del Peru 28: 390–391.
- (2007) [Plagiarism in undergraduate publications: experiences and recommendations]. Revista Medica de Chile 135: 1087–1088.
- (2007) [Ethics in articles published in medical journals]. Revista Medica de Chile 135: 529–533.