Using textmining to spot innovation in biomedical sciences

shubhanshuWhat is the real novelty of a research paper? How do different researchers contribute to innovation? And does this change throughout their career? Shubhanshu Mishra of the University of Illionois uses textmining techniques to study the novelty of biomedical articles.


“Novelty basically is this concept of being new or original. And a lot of times, people believe that being novel actually helps you in being much more prominent in the community. And we feel that using a systematic approach to quantifying the novelty of the articles can help you in identifying how, what your publishing pattern is. And it can also even help in the decision making bodies in identifying ‘What are the different kind of researchers?’ or ‘How is a field moving?’


So for that we used this large-scale collection of biomedical papers, given by the National Library of Medicine, in the US. And we used our techniques to find ‘What is the most novel thing about each paper?’ and ‘What is the youngest concept on each of the papers?’ and ‘How old are those concepts?’

We are sitting on this goldmine of scientific knowledge, which is being produced at an exponential rate every year. And it’s very hard for people to actually go and manually work out the patterns in these datasets, which are of the scale of millions and billions. And I think that’s where data mining can help, and especially textmining techniques can help a lot.

We finally created a web-based interface where every biomedical researcher can go and look at all the Medline articles, and the novelty scores of all of them. And also at how the authors’ novelty is changing across their career. And how each of the concepts in Medline is growing through time. So we make that publicly available, and also the data available, publicly.”