At the end of last year, I presented a webinar to the American Medical Informatics Association on clinical text mining and text engineering – applying text mining to medical records. This is not an area that we are concentrating on in OpenMinTeD, but it is still an area on which we should keep a watchful eye. There is a rapid growth of text mining over medical records, and it exposes issues and problems that we need to be aware of.
The OpenMinTeD project is divided into different tasks. It is the task of Agroknow to carry out the important job of gathering TDM requirements from our stakeholders (OpenMinTeD’s future platform users and contributors), so that OpenMinTeD will build a TDM platform that meets the requirements of our platform stakeholders as good as possible. We focus on gathering requirements from four different scientific domains, represented by the following different communities.
Recent years have witnessed an upsurge in the quantity of available digital research data, offering new insights and opportunities for improved understanding. Following advances in Natural Language Processing (NLP), Text and data mining (TDM) is emerging as an invaluable tool for harnessing the power of structured and unstructured content and data. Hidden and new knowledge can be discovered by using TDM at multiple levels and in multiple dimensions. However, text mining and NLP solutions are not easy to discover and use, nor are they easy to combine for end users.
OpenMinTeD aspires to create infrastructure that fosters and facilitates the use of text and data mining technologies in the scientific publications world.
But what does this mean in practice?
Take a look in the future with us, and discover some examples of what OpenMinTeD will make possible for scholarly communication!
On 7 December 2015, the text and data mining projects OpenMinTeD and FutureTDM organised a workshop about the text and data mining challenges for cultural heritage institutions. This workshop took place at the DISH conference, a biennial international conference on digital heritage and strategies for heritage institutions.
Text mining refers to “the process or practice of examining large collections of written resources in order to generate new information” (source). I am not an expert in text mining, but I understand that it is about applying specialized software/algorithms/techniques on existing textual information so that it can be read and analyzed by machines in order for them to extract more meaningful information for us, humans.