Early December 2017 (1st December), OpenMinTeD organized a session on Text and Data Mining in Open Science at the DI4R conference in Brussels (30 Nov-1 Dec).

The Digital Infrastructures for Research Conference attracted a wide audience from different backgrounds (researchers, innovators, data producers, scientific domain experts, librarians, data science practitioners, service providers, project leaders, policy makers and funders) to discuss policies, processes, best practices, data and services necessary for the support of the research process to ultimately advance scientific knowledge and Open Science.

The OpenMinTeD session discussed ways of supporting the EOSC vision by fostering collaboration between infrastructures and bringing into the spotlight Text and Data Mining (TDM) as a valuable research instrument opening new highways in (multi-/cross-) disciplinary research.

The session was attended by a wide range of participants from various disciplines, with varying interests and engaged in a multitude of activities relevant to Open Science, as can be seen by the voting tool we used for the interactive session.

In order to engage the participants and ensure a well-informed and fruitful discussion, the session was structured in two parts: the first part consisted of five enlightening talks that introduced the topic of TDM in the research workflow from the OpenMinTeD point of view; the second part was organized as an interactive session, with a discussion prompted by short statements from a panel and gradually involving all participants.

Natalia Manola (Athena RC) presented an overview of OpenMinTeD, focusing on the technical and legal challenges it has set out to overcome and the solutions it has adopted and implemented in the form of an open, service-oriented platform for TDM of scientific and scholarly content where researchers can collaboratively create, discover, share and re-use Knowledge from a wide range of interoperable text-based scientific related sources.

The next three presentations showcased how the OpenMinTeD platform can be used for conducting TDM-based research presenting real use cases from three disciplinary areas, namely Agriculture & Biodiversity (Claire Nédellec, INRA), Life Sciences (Christian O’Reilly, EPFL) and Social Sciences (Peter Mutschke, Gesis).

The last presentation, by Giulia Dore (CREATe, Glasgow University) discussed legal issues in TDM, mainly about licensing and IPR (Intellectual Property Rights), and the OpenMinTeD recommendations for overcoming legal barriers.

The panel brought together the expertise and views on TDM of representatives from three related infrastructures: Franciska de Jong from CLARIN, Paolo Manghi from OpenAIRE and Ron Dekker from CESSDA. The panelists presented the way TDM is envisaged or integrated in these infrastructures, the role of TDM in the Open Science paradigm, the way they offer TDM services to their end-users, and exchanged opinions on potential synergies with OpenMinTeD in promoting interoperability, raising awareness and training activities on TDM, Open Access and legal policies, supporting reusability of resources, reproducibility of research experiments and publication of research outcomes etc.

Finally, the audience joined in the discussion and offered their own perspectives on the deployment of TDM in their research practices.

During the discussion, the participants were asked to rate how existing infrastructures facilitate the integration of TDM in their research on a 1-10 basis; their responses confirmed findings from other similar surveys, i.e. the problem of finding guidance for legal issues but also the lack (or difficulty in finding) of TDM applications tailored to end-user needs.

These findings are encouraging as they prove that the OpenMinTeD approach towards TDM is on the right track. OpenMinTeD already offers solutions to these problems. First of all, it makes available content and TDM software together in one place. Most important, it facilitates the creation of domain-oriented end-user applications by setting up a pool of re-usable interoperable resources based on which TDM experts can build workflows by combining these resources with domain-specific ones in order to meet end-user needs. Additionally, OpenMinTeD has presented a licensing compatibility matrix to assist researchers, repository owners and many others with how to use open access licences in the context of TDM.

Last, as regards user requirements for improvements, we witnessed during the discussion a variety of important views touching upon different dimensions of TDM-enhanced research. In the context of the DI4R spirit of collaboration, the need to “establish standards for data exchange TDM in metadata and content, make services effectively usable (users, data, methods policy-driven)” stands out.




