We look at recommending articles to users

Daniel Kershaw works at Mendeley, where he uses text and data mining to recommend relevant articles to users. A lot of the issues in this work have to do with inconsistencies between data sources from different journals.

“I’m Daniel Kershaw, I work at Mendeley for Elsevier, and I focus on recommending recommender systems through text similarity comparisons.

So we look at recommending articles to end-users, based on their past reading history. So one of the simple methods we use is comparison of tf-idf factors between an article that you’ve read and articles in our library. We’re moving forward from basic tf-idf factors to looking at more complex word embeddings and document embeddings within multidimensional space.

TDM is used within the pre-filtering and post-filtering stages of our systems, where we try to extract just the relevant bits of information such as noun phrases and verb phrases , which include sort of names of inventions people have come across, protein names, or new fields which come into existence, and use those as features when making recommendations to people.

As with anything, 80% is getting used to the data and then the rest, the 20%, is then developing the system. So a lot of the issues we have are, even though we work for large publishing houses, inconsistencies between data sources from the different journals.

So part of it is dealing with the new answers of text formatting across different domains. In an ideal world there would be standard representations across data sources which allow people to understand by merely looking at the metadata of how a document is structured and how it is encoded. From there it would allow us to analyse the datasets easier and quicker, getting from an idea to result and ultimately product for us in a shorter time span.”