Open Access & Text Mining: Moving Things Forward

Text_mining Text mining refers to “the process or practice of examining large collections of written resources in order to generate new information” (source). I am not an expert in text mining, but I understand that it is about applying specialized software/algorithms/techniques on existing textual information so that it can be read and analyzed by machines in order for them to extract more meaningful information for us, humans. Of course, text mining is no news to the research community, as it seems that it all started back in the ’80s with a methodology titled CAVE (Content Analysis of Verbatim Explanations) but its background goes beyond the scope of this article. What I can tell you is that it is a complex process, involving techniques from areas such as information retrieval, natural language processing, information extraction and data mining – into a single workflow!

Is text mining useful for the research community?

You bet it is! Text mining can be applied in various research context; just imagine how fast computers can parse and analyze huge amounts of text and retrieve the text parts of interest to a researcher/user, saving him/her enormous amounts of time and effort. This becomes even more important nowadays, that the quantity of published research data is exponentially increased therefore it cannot be analyzed with traditional methods. Text mining can be used for species disambiguation in biology, extraction of domain-specific concepts from texts, text normalization, annotation with tags etc. – the National Center for Text Mining (NaCTeM) is an excellent source of text mining tools. One extremely useful application of text mining is that it allows links to be discovered that would never have been noticed during manual searches – and this is a vital benefit when it comes to research for drugs. Of course there are many more applications in various disciplines – all of them aiming at making the existing information more meaningful and accessible to everyone. In this context, Agro-Know is happy to be a part of the OpenMinTeD Horizon 2020 project, which aims to “enable the creation of an infrastructure that fosters and facilitates the use of text and data mining technologies in the scientific publications world and beyond, by both application domain users and text-mining experts.” 

The role of Agro-Know in the OpenMinTeD project

One of the things that we do is to define and develop the methodology for the elicitation of user requirements from the targeted user groups of the project and the transformation of these requirements into functional requirements that will drive the developments of the project. We need to ensure that the methodology will be able to extract requirements from different stakeholders’ groups with different needs and applications, including but not limited to
  • data providers (such as institutional repository managers and private publishers),
  • e-infra & aggregator operators (such as AGRIS, OpenAIRE and META-SHARE),
  • text mining researchers and
  • researcher application developers.
Just imagine how many different personas (users with common characteristics) may be involved in each one of these groups and how many different needs are waiting to be met by the platform to be developed by OpenMinTeD. These requirements will have to be carefully collected from selected stakeholders to be involved in the corresponding activities of the project, organized & analyzed, validated and thenvisualized into envisaged user interfaces interfaces and meaningful workflows (showing how each user interacts with the service or platform and this goes for many different user types) before they transformed into technical / functional requirements that the technical partners of the project will be able to use for actually working on the corresponding solutions (which in this case are the functionalities of the envisaged OpenMinTeD platform). This is a complex & challenging work – just think that all the effort and outcomes of the project will be based on the requirements elicited through this process (so we have to be really careful on developing the methodology!). It involves defining different methods for extracting requirements from different user types, the organisation of events and interviews, identifying the most appropriate methods for validating the requirements and last but not least passing meaningful specifications to the technical partners – but we are sure that it will worth every minute of the time spent on its planning!

AGRIS use case

We are also responsible for the agri-food community involvement, through the AGRISuse case – which is expected to provide the requirements which will enrich the existing AGRIS portal with text mining functionalities that will facilitate access to research outcomes as they are currently available through AGRIS.   This blogpost was written by