Things aren’t always what they seem: The PDF challenge (accepted)

Image CC-BY

There are situations where text miners might struggle with getting the textual data to perform the mining on in the first place. One problem for us is that most of scientific publications – especially in social sciences and humanities – are only available in PDF format, which is not suitable to be read and processed by computers. The OpenMinTeD social sciences work group accepted the challenge to work on this problem.

White paper on community requirements for text and data mining

Image CCO

Text and data mining is important to different scientific communities, but what do these different user communities need to mine succesfully?  One of the aims of workpackage 4 of the OpenMinTeD project is to collect these requirements. This was done using  a combination of methods, including online surveys and focus groups. The results are summarized in the  ‘White paper on OpenMinTed Community Requirements’ that was finished last week.