Showing posts with label eTBLAST. Show all posts
Showing posts with label eTBLAST. Show all posts

Sunday, October 31, 2010

Biomedical Text Similarity

Science Daily alerted me to this publication on PLoS
Systematic Characterizations of Text Similarity in Full Text Biomedical Publications
Sun Z,
Errami M, Long T, Renard C, Choradia N, et al. 2010 Systematic Characterizations of Text Similarity in Full Text Biomedical Publications. PLoS ONE 5(9): e12704.doi:10.1371/journal.pone.0012704
The authors of eTBLAST, a text-similarity search engine, have expanded their work that at first investigated text- and author-similarity on PubMed abstracts (CSP article from 2008). They have now accessed full-text articles to dig deeper into text similarities.

They investigated over 70.000 full papers, and determined that abstract similarity is a good predictor of full text similarity. They caution, however, that the automatic identification of possible cases of plagiarism must be checked by hand to determine if indeed plagiarism is present. They only uncovered 34 highly similar papers, and all were updates or multi-part articles that did indeed share larger sections of text.

However, they note that many of the currently uncovered plagiarized publications, for example in Chile and Peru [1], were translations and these are not included in the PubMed database.



[1] Sources given in the article about the Chilean and Peruvian cases:
  1. Arriola-Quiroz I, Curioso WH, Cruz-Encarnacion M, Gayoso O (2010) Characteristics and publication patterns of theses from a Peruvian medical school. Health Info Libr J 27(2): 148–154. 
  2. Salinas JL, Mayta-Tristan P (2008) [Duplicate publication: a Peruvian case]. Revista de Gastroenterologia del Peru 28: 390–391. 
  3. Rojas-Revoredo V, Huamani C, Mayta-Tristan P (2007) [Plagiarism in undergraduate publications: experiences and recommendations]. Revista Medica de Chile 135: 1087–1088. 
  4. Reyes H, Palma J, Andresen M (2007) [Ethics in articles published in medical journals]. Revista Medica de Chile 135: 529–533.

Wednesday, October 29, 2008

Paper by former vice-president of Iran retracted

Nature reports that a "review paper by Massoumeh Ebtekar, the former vice-president of Iran and an immunologist at Tarbiat Modares University in Tehran, is to be retracted from an Iranian journal following allegations that it was almost entirely stitched together from other scientists' papers."

NatureNews: Butler, Declan. Iranian paper sparks sense of deja vu - Allegations of plagiarism prompt journal to retract report. Published online 22 October 2008 | Nature 455, 1019 (2008) | doi:10.1038/4551019a (http://www.nature.com/news/2008/081022/full/4551019a.html)
The plagiarism is one of more than 70,000 entries in the Deja Vu database. Powered by a tool called eTBLAST, it collects similar articles from the various scientific journals indexed by Medline. It takes an abstract, searches for similar ones, and then compares them, determining which one was published first. This blog noted a previous case in January 2008.

There are a shocking number of papers that are exact duplicates (but published in different journals), or have the same abstract but are published in different languages, or are identical but have different authors. Deja Vu is run by the University of Texas Southwestern Medical Center at Dallas and is funded by the Hudson Foundation and the National Institutes of Health.

This is a great service to the community!

Thursday, January 24, 2008

Duplicate papers

Nature reports (Errami, M. & Garner, H. Nature 451, 397-399 (2008) ) on research done by two Texas researchers who investigated 62,000 papers available online in the Medline database. They were looking for plagiarism and autoplagiarism (publishing one paper in multiple journals) and found about 1% using their tool, eTBLAST, a "text similarity-based engine for searching literature collections".

As usual, any mention of the words "plagiarism" and "Internet" in the same paragraph causes journalists to suspect that plagiarism is "on the rise" and the call and try and get me to verify this, which I refuse to do. We can't measure the amount of plagiarism, only the amount of what we find. So if we can't measure it, we can't say if it is increasing or decreasing. At least this gave me a chance to spout off on some of my favorite topics, and they broadcast a large portion of my interview this afternoon on Deutschlandradio.

Duplicate papers are indeed a problem. Sometimes, one has a minor bit of new material, and wants to republish. I have even had a journal approach me and insist on paying for a translator to translate my paper on plagiarism into English to be published in their journal. I only permitted them to do this if they let me check the translation (it was not good, would have been easier to do it myself) and if they included a footnote explicitly stating that this was a translation of a previous paper).

But apparently, in the quest for AMPAP (as many publications as possible) people submit multiple copies of papers to different journals in the hopes that no one looks at them side by side and discovery them to be identical.

Is it "okay" to plagiarize oneself on the level of paragraphs or sentences? It also looks bad when a paper consists mostly of quotes of one's own work.

Another fine line between what is acceptable and what is not acceptable behavior.