Frederico Nanni was not always a text miner. He actually started out as a historian and then switched to digital humanities. During his PhD, he developed a method to detect interdisciplinary research, based on scientific abstracts. Now, he finds text mining fascinating and thinks more historians should learn how to do it.
It took some time for Drahomira Hermannova to see the value of her research topic, but now she thinks it is the best topic she could ever choose: using text and data mining to evaluate which research can change the world. Not only can this help scientists, it may change the way research is done altogether.
In the OpenMinTeD project, partners from different scientific communities are involved to make sure the OpenMinTeD infrastructure will address their needs. As regards the social sciences, a useful application for text mining is the improvement of literature search and information interlinking. To this end, three main challenges were identified: named entity recognition, automatic keyword assignment to texts and automatic detection of mentions of survey variables. This post gives an overview of these tasks and the progress of work so far.
Would you like to get more insight in the world of text and data miners? Daniel Duma is a PhD student at the Alan Turing Institute and the University of Edinburgh and he shares his story in a short movie. He is working on software that will recommend relevant papers to scientists writing papers.
If you want to do text and data mining in the EU, you run into a complex legal framework of copyright rules. During the OpenMinTeD webinar of November 23rd , this legal framework, limits and opportunities were discussed with legal as well as non-legal TDM experts. Recordings of the webinar and the discussion are available online.
There are situations where text miners might struggle with getting the textual data to perform the mining on in the first place. One problem for us is that most of scientific publications – especially in social sciences and humanities – are only available in PDF format, which is not suitable to be read and processed by computers. The OpenMinTeD social sciences work group accepted the challenge to work on this problem.