Sunday, October 31, 2010

Biomedical Text Similarity

Science Daily alerted me to this publication on PLoS
Systematic Characterizations of Text Similarity in Full Text Biomedical Publications
Sun Z,
Errami M, Long T, Renard C, Choradia N, et al. 2010 Systematic Characterizations of Text Similarity in Full Text Biomedical Publications. PLoS ONE 5(9): e12704.doi:10.1371/journal.pone.0012704
The authors of eTBLAST, a text-similarity search engine, have expanded their work that at first investigated text- and author-similarity on PubMed abstracts (CSP article from 2008). They have now accessed full-text articles to dig deeper into text similarities.

They investigated over 70.000 full papers, and determined that abstract similarity is a good predictor of full text similarity. They caution, however, that the automatic identification of possible cases of plagiarism must be checked by hand to determine if indeed plagiarism is present. They only uncovered 34 highly similar papers, and all were updates or multi-part articles that did indeed share larger sections of text.

However, they note that many of the currently uncovered plagiarized publications, for example in Chile and Peru [1], were translations and these are not included in the PubMed database.



[1] Sources given in the article about the Chilean and Peruvian cases:
  1. Arriola-Quiroz I, Curioso WH, Cruz-Encarnacion M, Gayoso O (2010) Characteristics and publication patterns of theses from a Peruvian medical school. Health Info Libr J 27(2): 148–154. 
  2. Salinas JL, Mayta-Tristan P (2008) [Duplicate publication: a Peruvian case]. Revista de Gastroenterologia del Peru 28: 390–391. 
  3. Rojas-Revoredo V, Huamani C, Mayta-Tristan P (2007) [Plagiarism in undergraduate publications: experiences and recommendations]. Revista Medica de Chile 135: 1087–1088. 
  4. Reyes H, Palma J, Andresen M (2007) [Ethics in articles published in medical journals]. Revista Medica de Chile 135: 529–533.

No comments:

Post a Comment

Please note that I moderate comments. Any comments that I consider unscientific will not be published.