Welcome to the Newsletter page of OpenMinTeD
Supported by the Ministry of Higher Education, Research and Innovation (MESRI) in the framework of the Digital Scientific Library (BSN) and the Committee for Open Science (CoSO), the Visa TM project (Towards Advanced Text-Mining Services Infrastructure), led by INRA, was launched in June 2017 for a two-year period.
This project aims to study the conditions of production of high value-added TDM services based on semantic analysis by synergizing the interests and complementarities of the various partners: an IST operator (Inist), a research establishment (INRA) and a university (University of Montpellier).
OpenMinTeD was present since three of the main partners (Institut National de la Recherche Agronomique, France, Athena Research & Innovation Center, Greece, Ubiquitous Knowledge Processing Lab (UKP) at Technische Universität Darmstadt, Germany) of the project were invited and gave presentations, participating into fruitful sessions and discussions.
Claire Nedellec, presented VisaTM, focusing on bridging the gap between the needs and the solutions!
On the strategic side of TDM, applicable to broad communities, Natalia Manola from Athena Research & Innovation Center, pointed the need within EOSC, to use TDM to make publications and data smart and actionable through OpenAIRE and OpenMinTeD.
From research developers’ on TDM point of view, a very interesting talk about Open Development Strategy and data mining tools like DKPro Core (software components for natural language processing) and INCEpTION was given by Dr.-Ing. Richard Eckart de Castilho.
At the end, a talk by Sylvain Massip to imagine the future TDM services was presented collecting all the the outputs of the workshop dedicated to this topic at VisaTMDay. The ultimate dream: a service based on all the TDM services that can answer any natural language question !
You may consult and download from VisaTM blog :
– the eight public reports of the project which detail the various points discussed during the day,
– afternoon workshop feedbacks – thanks to the facilitators and contributors
OpenAIRE becomes a fully fledged organisation
An EU organisation to facilitate openness in scholarly communication
October 29, 2018
OpenAIRE is happy to announce today the formation of its legal entity, OpenAIRE A.M.K.Ε., a non-profit partnership, to ensure a permanent presence and structure for a European-wide national policy and open scholarly communication infrastructure.
“OpenAIRE has reached a milestone: for ten years we have spearheaded the principles of openness, and we have now emerged as a key player in the Open Science landscape in Europe with global ties. Open Science practices are gaining global momentum, and committed players are needed to support this shift. OpenAIRE as an organisation from now on, will provide a permanent platform to support tomorrow’s research for Europe. We can’t wait to make this work and to achieve this, we actively invite the contribution of the Open Science and research community.’’
Prof. Yannis Ioannidis, OpenAIRE A.M.K.E Interim Head
About OpenAIRE: OpenAIRE (www.openaire.eu), funded by the EC since 2008, has led the shift to open scholarship in Europe and helped alignment with the rest of the world. An e-Infrastructure with a true EU footprint, OpenAIRE promotes open scholarship and improves the discoverability, accessibility, sharability, reusability, reproducibility and monitoring of data-driven research results, across scientific disciplines and thematic domains, cross-border in Europe and beyond.
We democratise the research life-cycle, by assisting the transition of how research is performed and knowledge is shared.
A community-driven organisation at heart, OpenAIRE addresses via our 34 National Open Access Desks (NOADs) in EU member states and associated countries, accompanied by a service-driven architecture, the “no-one size-fits-all” of the diverse research community and cultural variety of Europe, making this unique infrastructure an integral part and a leading force behind the developments of the European Open Science Cloud (EOSC).
Structure: Following a hybrid model of member organisation and member state representation, the OpenAIRE A.M.K.E. aims to become the foundation for national coordination on Open Science in Europe, achieving long-term sustainability and economies of scale.
Becoming a member: OpenAIRE A.M.K.E. sets off with its current base. To accomplish a truly open and participatory modus operandi, it is open for other organisations to join from February 2019 onwards. Members of the organisation will apply their expertise in their national or thematic contexts to:
- Support of reproducible research with technical services
- Alignment of Open Science policies
- Support & Training for Open Science
Our members are expected to actively contribute to shaping the European open scholarly communication infrastructure, capitalising on their collective experience in Open Science. In this new setting, we will continue and strengthen our efforts within the EOSC context to engage all EU and associated member states to commit to the alignment and implementation of Open Science and outreach to other organisations beyond the OpenAIRE project base.
Announcement video by Professor Yiannis Ioannidis
Further information on OpenAIRE: https://www.openaire.eu/organization
Who to contact to learn how to join the OpenAIRE organisation: Prodromos Tsiavos at email@example.com
Type of legal entity: OpenAIRE has the legal form of a Non-Profit Partnership (NPP) incorporated under the provisions of Greek Law (articles 741 onwards of the Greek Civil Code) and Law No 4072/2012.
Background: Open Science era + the sheer volume of scholarly works (about 2.5 mi peer reviewed publications every year in English alone)
What OpenMinTeD is about: Researchers, Open Access publishers, librarians, repository managers and SMEs can now easily harness the power of text and data mining (TDM) for scientific content. The recently launched OpenMinTeD infrastructure, funded by the European Commission H2020 Grant 654021, a preamble to the European Open Science Cloud, enables the registration and deployment of existing TDM tools and applications, the connection to OA scientific content, allowing researchers to seamlessly discover, share, analyse and re-use knowledge. All, well presented and operating on a cloud infrastructure. It makes this possible through the OpenMinTeD Interoperability Guidelines, which address interoperability aspects for content and services.
Does your work involve supporting researchers who are interested in Text and Data Mining (TDM)? Do you have an interest in the topic, but no coding or computer skills? Then this course may be interesting to you: OpenMinTeD and the University of Cambridge developed a free online course on text and data mining for ‘non-tech people’.
The OpenMinTeD event titled ‘ Paving the way for text and data mining in science’ was successfully organized in Brussels on May 24th, 2018. It was an open invitation to all stakeholders (publishers as content providers, TDM experts, researchers and SMEs) of TDM in Europe. The structure of the event’s agenda was carefully designed as to provide a full TDM experience and only focus on OpenMinTeD. Afterall, OpenMinTeD is a “TDM Hub” of TDM applications and components combined with open access content from open access aggregators.
The event started with a brief welcome and a short introduction of what OpenMinTeD is by the OpenMinTeD coordinator and OpenAIRE Managing Director, Natalia Manola.
Following, the EC perspective on Text and Data Mining and Open Science was presented by two EC officers Caroline Colin, and Jean-François Dechamp. In their presentation, the audience was informed on the main objectives of the new directive on copyright in the Digital Single Market.
- Modernising EU rules on key exceptions and limitations in the areas of research, education, and preservation of cultural heritage
- Facilitating licences in order to ensure wider access to content (out-of-commerce works, negotiation mechanism/VoD platforms)
- Introducing fairer rules for a better functioning copyright marketplace (press publisher’s rights, value gap, remuneration of authors and performers)
Furthermore, it was explained, why do we (EC) care for TDM?
- It’s different for science, meaning that authors usually give away their copyright rights and license-based solutions for scientific papers do not seem to work
- Digital data amount of content requires massive analysis with TDM and almost all scientific journals are already available online such as research libraries collections
- Open Science that is supported by public funding, is composed of multi-discipline sources from public and private owners, and allows reusability of data
Additionally, it was mentioned that the proposal of the European Commission in the European Council, focusing on TDM was to set a mandatory exception allowing research organisations to carry out TDM on content they have lawful access to for scientific research purposes (commercial and non-commercial).
The next session was on a storyline on TDM, on “Making sense of Science”. The story of OpenMinTeD was also presented, how it started and now how you can process, share and discover TDM tools and content, by Stelios Piperidis (Institute for Language and Speech Processing, Athena Research & Innovation Center). The presentation pointed the massive content production in general and focused on the scientific content (2.5 million articles/year). The need to make sense of all that data by using machine learning, understanding of entities, relations, structures, and extract meaningful insights to improve the ability to predict was pointed out. Even though there are solutions out there, they focus on different text types, domains, tasks, languages, creating a complex landscape. This complexity triggered the initialization of the OpenMinTeD project and its services that focus on content providers, software providers, researchers, SMEs. The services and the overall operations of OpenMinTeD were explained.
The services of OpenMinTeD platform are briefly the following:
- The OpenMinTeD catalogue of corpora, mainly datasets of open access scholarly publications, registered in the OpenMinTeD platform. Users can view and browse publicly available corpora.
- The OpenMinTeD catalogue of TDM applications. The catalogue targets users with no or little prior text mining experience that can search for, discover and easily use ready-to-run applications on content registered in the platform.
- The OpenMinTeD catalogue of TDM components, i.e. pieces of software that perform basic tasks and can be reused to build applications, targets mainly TDM developers who know how to combine them together in order to build workflows with the OpenMinTeD workflow editor and finally offer them to end-users in the form of ready-to-use applications.
- The OpenMinTeD catalogue of ancillary knowledge resources includes Machine Learning (ML) models and computational grammars that can be combined with TDM software, as well as annotation resources, (lexica, ontologies, etc.), that can be used for annotating content resources. Users can browse through the catalogue or discover resources according to specific criteria.
- OpenMinTeD TDM applications execution service This service targets primarily researchers with little or no knowledge of text mining who need to find and run TDM applications on content without going through complicated processes.
- OpenMinTeD corpus builder of scholarly works. This service mechanism allows users to form a collection of open access to scholarly and scientific content from major content aggregators (i.e. OpenAIRE, CORE) and create a “corpus” to mine.
- OpenMinTeD builder of TDM applications, where users can build new TDM applications by combining together various TDM components. The service is intended for expert TDM developers who know how to configure the TDM components.
- OpenMinTeD TDM Support & Training services that aim to (a) raise awareness about TDM among researchers and instruct them on how to integrate it in their research activities and workflows, and (b) promote the OpenMinTeD platform. The OpenMinTeD services on TDM support & training include FAQs, Webinars, Tutorials, TDM stories courses, guidelines. More can be found in OpenMinTeD Knowledge Base in the FOSTER platform.
- Catering for legal interoperability, OpenMinTeD has elaborated a license compatibility matrix , a service that expands its usage beyond OpenMinTeD. It demonstrates the compatibility among available licenses on content, software and services.
Lastly, Piperidis demonstrated how OpenMinTeD is reaching out to scientific communities from the very beginning of this project, on Scholarly communication, Life Sciences, Agriculture, Social Sciences.
Next session was on TDM for scientific literature in practice; starting with the publishers and closing with the success stories of three winners of the OpenMinTeD Open call2 on software providers. The publishers that kindly accepted our invitation to participate in this discussion were: Elizabeth Crossick (RELX Group), Frederick Fenter (Frontiers) and Stuart Taylor (Royal Society). All three representatives of publishers group, explained that the TDM approach over analysing many articles is crucial to assist research.
The session started with the panelists making brief presentations on the barriers to and opportunities of TDM from their own perspectives and experiences. It was then followed by an open discussion between the panelists and the audience. Several key themes were touched upon, including technical and policy barriers to mine content from scientific publishers, expectations and trust both from the publisher perspective and the miner perspective, opportunities for effective collaboration and mutual benefits, licensing and the role of Open Access publishing in TDM.
Throughout the discussion, the collaborative aspect, along with the need from the TDM community to be able to efficiently mining the corpora hosted on the publisher platforms without incurring in unnecessary technical barriers, were emphasised, with both the panelists and the audience agreeing that it is extremely important to lower as much as possible barrier to TDM within the legal framework of copyright and that only through thoughtful and practical conversations with the community publishers would be able to provide the best services in support on efficient and effective TDM practices.
The session was completed with the following presentations:
Three winners of the Open Calls were invited to present their work. Horacio Saggion (UPF, TALN Group, University of Barcelona), showed the “Scientific Summarization Services” tool that his team has integrated in the OpenMinTeD platform. It automatically identifies the most important information of a research article, by analyzing, extracting and characterizing several aspects of each sentence. This information is used to compute different scores to rank each sentence of the article.
Fabio Rinaldi (University of Zurich and Swiss Institute of Bioinformatics, Switzerland), presented the “BTH & OGER for OpenMinTeD” tool integrated in OpenMinTeD. The OntoGene’s Biomedical Entity Recogniser (OGER) allows annotation of a collection of documents, while the Bio Term Hub is a one-stop site for obtaining up-to-date biomedical terminological resources.
Matthew Shardlow (Manchester Metropolitan University), presented a Text mining application for Journalism, integrated in the OpenMinTeD platform. “A journalist must be a temporary expert in a wide variety of topics”. Starting from this fact, the presentation showed how the five W’s (What, Where, When, Who, Why) a journalist has to answer, can be found by searching in scientific literature and applying this text mining tool.
Continuing, the legal session took over with Maria Rehbinder (Aalto University) and Prodromos Tsiavos (Athena Research Center) accepting the invitation to join. The almost identical day of activating the GDPR directive all over Europe, initiated an open discussion on the effect of GDPR on TDM. Would GDPR signal the death of TDM? Thomas Margoni (University of Glasgow, Create) explained how OpenMinTeD managed to overcome legal challenges, barriers and informed researchers, TDM experts, content providers. The key element was the “Compatibility Matrix” created within OpenMinTeD project to guide stakeholders on combination of licenses on content, software, services.
At the end of this session, the winners of the Open Call 2 discussed and commented on the unique features of OpenMinTeD in comparison to other platforms in this area. These include that OpenMinTeD enables, as opposed to other TDM orchestration platforms, a very flexible way of integrating text and data mining components available widely used TDM tools, including UIMA and GATE, as well as the use of custom built TDM components as docker images and external web services. Another area mentioned that has been seen as a powerful feature of OpenMinTeD is the availability of large corpora and text processing tools within the same platform.
The legal session offered an overview of the main results of the project’s legal interoperability working group led by Thomas Margoni from CREATe – University of Glasgow. The report started with a brief overview of the current EU legal framework in the field of TDM and why the currently proposed text of Art. 3 (the TDM exception for research organisations) while underpinned by the right innovation policy goal is not satisfactory. Furthermore, in addition to the already mentioned licence compatibility matrix, a set of supporting documents (e.g. the Open Science Fact Sheet and an Open Access FAQs) and a recent analysis of the legal implications on training models for natural language processing (NLP) applications (poster here) were showcased. These results and documents were presented in the format of an open discussion. Maria Rehbinder (Aalto University) kindly accepted to moderate and Prodromos Tsiavos (Athena Research Center) offered a high level perspective extending to privacy/data protection (very timely as the GDPR entered into force on the next day!) and Public Sector Information and suggesting that these latter pieces of EU law, which are or have been also object of recent reform or reform proposals, may offer a better source of inspiration for the future challenges of data governance.
The last session 3YFN (3 years from now) was a panel discussion, focusing on the potential use of TDM technologies, platforms, infrastructures in the near future. How industry responds and moves towards the TDM adoption? What do researchers foresee? The panel was composed by: Alfonso Valencia (ELIXIR & Barcelona Supercomputing Center), Laurence El Khouri (ISTEX & National Center for Scientific Research (DIST/CNRS)), Sophia Ananiadou (NaCTeM, National Centre for Text Mining, University of Manchester), Claire Nédellec (INRA, Institut national de la recherche agronomique).
Presentations material here:
Ewoud Sanders is best known for his weekly column WoordHoek (‘Word Corner’) in the newspaper NRC Handelsblad where he writes about the history of Dutch words and expressions.
He is on a quest to improve digital access to printed Dutch language resources and his pamphlet Eerste Hulp Bij e-Onderzoek (‘First Aid for e-Research’) has been reprinted 16 times and distributed free of charge to students by several Dutch institutes of higher learning. In 2011, Google gave him a grant of $15,000 to help improve internet searching in the Netherlands.
Stephane Schneider is IT project manager at the Institute for Scientific and Technical Information (INIST-CNRS). INIST has one of the most important collections of scientific publications in Europe and provides a range of information search services for science and higher education. Stephane tells about his work and what he expects for the future of TDM.
The Proposal for a Directive on Copyright in the Digital Single Market (the Proposal) contains a number of provisions intended to modernise EU copyright law and to make it “fit for the digital age”. Some of these provisions have been object of a lively scholarly debate in the light of their controversial nature (the proposed adjustment of intermediary liability for copyright purposes contained in Art. 13, see here at p. 7) or because they propose to introduce a new right within the already variegate EU neighbouring right landscape (i.e. the protection for press publishers contained in Art. 11).