TDM story: scientists as TDM-costumers

Stephane Schneider is IT project manager at the Institute for Scientific and Technical Information (INIST-CNRS). INIST has one of the most important collections of scientific publications in Europe and provides a range of information search services for science and higher education. Stephane tells about his work and what he expects for the future of TDM. 

Read the full interview below, or download a printable version to share with others.

When did you hear about text and data mining for the first time?

This was at school, where I got a graduate degree in computational linguistics. However, the tools and methods have greatly evolved since then. In particular with the advent of complex numeric methods for big data manipulation and the possibility to mix them with traditional NLP technics.

What do you currently do with TDM?

I am an IT project manager, specialized in document application design that uses text mining for the analysis and search of scientific documents. At the moment, we are about to develop a new application that will explore a corpus on the ISTEX platform. This platform (www.istex.fr) is hosted by INIST-CNRS and has 21 million digital documents (journal articles, book chapters, text corpora etc.) from all disciplines. We focus on geoscience and want to test the feasibility of using an OpenMinTeD workflow for this kind of scientific corpora.

Why do you think TDM is important?

Because of the technical advances in recent years, our modern world is flooded with information. TDM can help to sort and analyse information and in that way keep us from drowning.
I think researchers could be the first ‘TDM customers’, as they are by nature big consumers of information. Worldwide, more than 75 million articles are published (Univ Ottawa). Rapidly growing volumes of scientific information make it increasingly complex to find relevant knowledge. TDM can help scientists with this.
I like the idea that the computer contributes to scientific progress, by making sense of and better understanding language.

How do you see the future of TDM?

The context is favourable, even though there are still barriers. Look, for example, at the legal context in France: the conditions for using TDM make things very complex. Disseminating TDM tools is also a problem. We have to introduce a software management plan, in order to improve the access to and reuse of TDM software.
TDM is a highly technical domain, but most good results come from work of a scientist who is also a TDM specialist. This cross-pollination has to be promoted more. INIST-CNRS would be a good meeting place for this.

What do you think is the biggest challenge?

On one hand, text and data miners have to help researchers who do not know what to do with TDM. They can show what’s possible, and how TDM can provide solutions. On the other hand, text and data miners need to pay more attention to users, and offer services based on real user experiences.