There are situations where text miners might struggle with getting the textual data to perform the mining on in the first place. One problem for us is that most of scientific publications – especially in social sciences and humanities – are only available in PDF format, which is not suitable to be read and processed by computers. The OpenMinTeD social sciences work group accepted the challenge to work on this problem.
Are you looking for support or training for text and data mining? Then you’re at the right place! OpenMinTeD recently released a Knowledge Base, that will host open access support and training material. At the moment we are still in the process of uploading content, but you can already have a look.
Text and data mining is important to different scientific communities, but what do these different user communities need to mine succesfully? One of the aims of workpackage 4 of the OpenMinTeD project is to collect these requirements. This was done using a combination of methods, including online surveys and focus groups. The results are summarized in the ‘White paper on OpenMinTed Community Requirements’ that was finished last week.
CORE is an aggregation service that harvests open access journals and repositories, institutional and disciplinary, from around the world. It offers one of the largest collections of scientific content via its Datasets, ready to be text-mined. We encourage everyone to use it as part of OpenMinTeD and beyond.