Blog

Press Release: OpenMinTed Paving the Way for Text and Data Mining in Science

Background: Open Science era + the sheer volume of scholarly works (about 2.5 mi peer reviewed publications every year in English alone)

What OpenMinTeD is about: Researchers, Open Access publishers, librarians, repository managers and SMEs can now easily harness the power of text and data mining (TDM) for scientific content. The recently launched OpenMinTeD infrastructure, funded by the European  Commission H2020 Grant 654021, a preamble to the European Open Science Cloud, enables the registration and deployment of existing TDM tools and applications, the connection to OA scientific content, allowing researchers to seamlessly  discover, share, analyse and re-use knowledge. All, well presented and operating on a cloud infrastructure. It makes this possible through the OpenMinTeD Interoperability Guidelines, which address interoperability aspects for content and services.

Read more

New Online Course: Introduction to Text and Data Mining

Does your work involve supporting researchers who are interested in Text and Data Mining (TDM)?  Do you have an interest in the topic, but no coding or computer skills? Then this course may be interesting to you: OpenMinTeD and the University of Cambridge developed a free online course on text and data mining for ‘non-tech people’.

Read more

Presenting OpenMinTeD Experience

The OpenMinTeD event titled ‘ Paving the way for text and data mining in science’ was successfully organized in Brussels on May 24th, 2018. It was an open invitation to all stakeholders (publishers as content providers, TDM experts, researchers and SMEs) of TDM in Europe. The structure of the event’s agenda was carefully designed as to provide a full TDM experience and only focus on OpenMinTeD. Afterall, OpenMinTeD is a “TDM Hub” of TDM applications and components combined with open access content from open access aggregators.

Welcoming speech by Natalia Manola

The event started with a brief welcome and a short introduction of what OpenMinTeD is by the OpenMinTeD coordinator and OpenAIRE Managing Director, Natalia Manola. 

Following, the EC perspective on Text and Data Mining and Open Science was presented by  two EC officers Caroline Colin, and Jean-François Dechamp. In their presentation, the audience was informed on the main objectives of the new directive on copyright in the Digital Single Market.

Caroline Colin

These are:

  • Modernising EU rules on key exceptions and limitations in the areas of research, education, and preservation of cultural heritage
  • Facilitating licences in order to ensure wider access to content (out-of-commerce works, negotiation mechanism/VoD platforms)
  • Introducing fairer rules for a better functioning copyright marketplace (press publisher’s rights, value gap, remuneration of authors and performers)

Caroline Colin and Jean-François Dechamp

Furthermore, it was explained, why do we (EC) care for TDM? 

  1. It’s different for science, meaning that authors usually give away their copyright rights and license-based solutions for scientific papers do not seem to work
  2. Digital data amount of content requires massive analysis with TDM and almost all scientific journals are already available online such as research libraries collections
  3. Open Science that is supported by  public funding, is composed of multi-discipline sources from public and private owners, and allows reusability of data

Additionally, it was mentioned that the proposal of the European Commission in the European Council, focusing on TDM was to set a mandatory exception allowing research organisations to carry out TDM on content they have lawful access to for scientific research purposes (commercial and non-commercial).

Stelios Piperidis

The next session was on a storyline on TDM, on “Making sense of Science”. The story of OpenMinTeD was also presented, how it started and now how you can process, share and discover TDM tools and content, by Stelios Piperidis (Institute for Language and Speech Processing, Athena Research & Innovation Center). The presentation pointed the massive content production in general and focused on the scientific content (2.5 million articles/year). The need to make sense of all that data by using machine learning, understanding of entities, relations, structures, and extract meaningful insights to improve the ability to predict was pointed out. Even though there are solutions out there, they focus on different text types, domains, tasks, languages, creating a complex landscape. This complexity triggered the initialization of the OpenMinTeD project and its services that focus on content providers, software providers, researchers, SMEs. The services and the overall operations of OpenMinTeD were explained.

 

The services of OpenMinTeD platform are briefly the following:

  • The OpenMinTeD catalogue of corpora, mainly datasets of open access scholarly publications, registered in the OpenMinTeD platform. Users can view and browse publicly available corpora.
  • The OpenMinTeD catalogue of TDM applications. The catalogue targets users with no or little prior text mining experience that can search for, discover and easily use ready-to-run applications on content registered in the platform.
  • The OpenMinTeD catalogue of TDM components, i.e. pieces of software that perform basic tasks and can be reused to build applications, targets mainly TDM developers who know how to combine them together in order to build workflows with the OpenMinTeD workflow editor and finally offer them to end-users in the form of ready-to-use applications.
  • The OpenMinTeD catalogue of ancillary knowledge resources includes Machine Learning (ML) models and computational grammars that can be combined with TDM software, as well as annotation resources, (lexica, ontologies, etc.), that can be used for annotating content resources. Users can browse through the catalogue or discover resources according to specific criteria.
  • OpenMinTeD TDM applications execution service This service targets primarily researchers with little or no knowledge of text mining who need to find and run TDM applications on content without going through complicated processes.
  • OpenMinTeD corpus builder of scholarly works. This service mechanism allows users to form a collection of open access to scholarly and scientific content from major content aggregators (i.e. OpenAIRE, CORE) and create a “corpus” to mine.
  • OpenMinTeD builder of TDM applications, where users can build new TDM applications by combining together various TDM components. The service is intended for expert TDM developers who know how to configure the TDM components.
  • OpenMinTeD TDM Support & Training services that aim to (a) raise awareness about TDM among researchers and instruct them on how to integrate it in their research activities and workflows, and (b) promote the OpenMinTeD platform. The OpenMinTeD services on TDM support & training include FAQs, Webinars, Tutorials, TDM stories courses, guidelines. More can be found in OpenMinTeD Knowledge Base in the FOSTER platform.
  • Catering for legal interoperability, OpenMinTeD has elaborated a  license compatibility matrix , a  service that expands its usage beyond OpenMinTeD. It demonstrates the compatibility among available licenses on content, software and services.

Lastly, Piperidis demonstrated how OpenMinTeD is reaching out to scientific communities from the very beginning of this project, on Scholarly communication, Life Sciences, Agriculture, Social Sciences.

Elizabeth Crossick

Next session was on TDM for scientific literature in practice; starting with the publishers and closing with the success stories of three winners of the OpenMinTeD Open call2 on software providers. The publishers that kindly accepted our invitation to participate in this discussion were: Elizabeth Crossick (RELX Group), Frederick Fenter (Frontiers) and Stuart Taylor (Royal Society). All three representatives of publishers group, explained that the TDM approach over analysing many articles is crucial to assist research. 

The session started with the panelists making brief presentations on the barriers to and opportunities of TDM from their own perspectives and experiences. It was then followed by an open discussion between the panelists and the audience. Several key themes were touched upon, including technical and policy barriers to mine content from scientific publishers, expectations  and trust both from the publisher perspective and the miner perspective, opportunities for effective collaboration and mutual benefits, licensing and the role of Open Access publishing in TDM.

Frederick Fenter

Throughout the discussion, the collaborative aspect, along with the need from the TDM community to be able to efficiently mining the corpora hosted on the publisher platforms without incurring in unnecessary technical barriers, were emphasised, with both the panelists and the audience agreeing that it is extremely important to lower as much as possible barrier to TDM within the legal framework of copyright and that only through thoughtful and practical conversations with the community publishers would be able to provide the best services in support on efficient and effective TDM practices.

Stuart Taylor

 

 

The session was completed with the following presentations:

Horacio Saggion

 

 

 

 

Three winners of the Open Calls were invited to present their work. Horacio Saggion (UPF, TALN Group, University of Barcelona), showed the “Scientific Summarization Services” tool that his team has integrated in the OpenMinTeD platform. It automatically identifies the most important information of a research article, by analyzing, extracting and characterizing several aspects of each sentence. This information is used to compute different scores to rank each sentence of the article.

Fabio Rinaldi

 

Fabio Rinaldi (University of Zurich and Swiss Institute of Bioinformatics, Switzerland), presented the “BTH & OGER for OpenMinTeD” tool integrated in OpenMinTeD. The OntoGene’s Biomedical Entity Recogniser (OGER) allows annotation of a collection of documents, while the Bio Term Hub is a one-stop site for obtaining up-to-date biomedical terminological resources.

 

Matthew Shardlow

Matthew Shardlow (Manchester Metropolitan University), presented a Text mining application for Journalismintegrated in the OpenMinTeD platform. “A journalist must be a temporary expert in a wide variety of topics”. Starting from this fact, the presentation showed how the five W’s  (What, Where, When, Who, Why) a journalist has to answer, can be found by searching in scientific literature and applying this text mining tool. 

Continuing, the legal session took over with Maria Rehbinder (Aalto University) and Prodromos Tsiavos (Athena Research Center) accepting the invitation to join. The almost identical day of activating the GDPR directive all over Europe, initiated an open discussion on the effect of GDPR on TDM. Would GDPR signal the death of TDM? Thomas Margoni (University of Glasgow, Create) explained how OpenMinTeD managed to overcome legal challenges, barriers and  informed researchers, TDM experts, content providers. The key element was the “Compatibility Matrix” created within OpenMinTeD project to guide stakeholders on combination of licenses on content, software, services. 

At the end of this session, the winners of the Open Call 2 discussed and commented on the unique features of OpenMinTeD in comparison to other platforms in this area. These include that OpenMinTeD enables, as opposed to other TDM orchestration platforms, a very flexible way of integrating text and data mining components available widely used TDM tools, including UIMA and GATE, as well as the use of custom built TDM components as docker images and external web services. Another area mentioned that has been seen as a powerful feature of OpenMinTeD is the availability of large corpora and text processing tools within the same platform.

Thomas Margoni, Maria Rehbinder and Prodromos Tsiavos

The legal session offered an overview of the main results of the project’s legal interoperability working group led by Thomas Margoni from CREATe – University of Glasgow. The report started with a brief overview of the current EU legal framework in the field of TDM and why the currently proposed text of Art. 3 (the TDM exception for research organisations) while underpinned by the right innovation policy goal is not satisfactory. Furthermore, in addition to the already mentioned licence compatibility matrix, a set of supporting documents (e.g. the Open Science Fact Sheet and an Open Access FAQs) and a recent analysis of the legal implications on training models for natural language processing (NLP) applications (poster here) were showcased. These results and documents were presented in the format of an open discussion. Maria Rehbinder (Aalto University) kindly accepted to moderate and Prodromos Tsiavos (Athena Research Center) offered a high level perspective extending to privacy/data protection (very timely as the GDPR entered into force on the next day!) and Public Sector Information and suggesting that these latter pieces of EU law, which are or have been also object of recent reform or reform proposals, may offer a better source of inspiration for the future challenges of data governance.

 

Natalia Manola, Sophia Ananiadou, Claire Nédellec, Laurence El Khouri, Alfonso Valencia

The last session 3YFN (3 years from now) was a panel discussion, focusing on the potential use of TDM technologies, platforms, infrastructures in the near future. How industry responds and moves towards the TDM adoption? What do researchers foresee? The panel was composed by: Alfonso Valencia (ELIXIR & Barcelona Supercomputing Center), Laurence El Khouri (ISTEX & National Center for Scientific Research (DIST/CNRS)), Sophia Ananiadou (NaCTeM, National Centre for Text Mining, University of Manchester), Claire Nédellec (INRA, Institut national de la recherche agronomique). 

 

Presentations material here:

EC

Storyline of OpenMinTeD-S. Piperidis

OpenMinTeD-Stuart-Taylor

Open Calls-H. Saggion-Scientific Summarization, F. Rinaldi-BTH&OGER, M. Shardlow-TDM For Journalism

Legal-P. Tsiavos-4EU policy moments, T. Margoni-OpenMinTeD

 

 

 

Read more

TDM Story: Analysing Language

Ewoud Sanders is best known for his weekly column WoordHoek (‘Word Corner’) in the newspaper NRC Handelsblad where he writes about the history of Dutch words and expressions.

He is on a quest to improve digital access to printed Dutch language resources and his pamphlet Eerste Hulp Bij e-Onderzoek (‘First Aid for e-Research’) has been reprinted 16 times and distributed free of charge to students by several Dutch institutes of higher learning. In 2011, Google gave him a grant of $15,000 to help improve internet searching in the Netherlands.

Read more

Mapping Seed Development Thanks to TDM

The Bibliome group at the French National Institute for Agricultural Research (INRA) has developed a text-mining application that extracts fine information about seed development from thousands of texts. It gives scientists better and quicker access to how molecules, genes and proteins interact when a seed starts to grow.

Good Seed Makes a Good Crop

Inside a seed are components such as molecules, genes and proteins. The presence of these components and how they interact determines if a particular seed can be used for human or animal consumption or by industry. A better understanding of seed biology and development is therefore important for both crop breeders and industrial companies. Finding out which genes interact with which protein in which tissue at which stage is a key question for researchers in plant breeding.

Read more

TDM story: scientists as TDM-costumers

Stephane Schneider is IT project manager at the Institute for Scientific and Technical Information (INIST-CNRS). INIST has one of the most important collections of scientific publications in Europe and provides a range of information search services for science and higher education. Stephane tells about his work and what he expects for the future of TDM. 

Read more

Why the proposed Text and Data Mining exception is not what EU copyright law needs

Road signs blockade

Photo by Jamie Street on Unsplash

1)Introduction

The Proposal for a Directive on Copyright in the Digital Single Market (the Proposal) contains a number of provisions intended to modernise EU copyright law and to make it “fit for the digital age”.[1] Some of these provisions have been object of a lively scholarly debate in the light of their controversial nature (the proposed adjustment of intermediary liability for copyright purposes contained in Art. 13, see here at p. 7) or because they propose to introduce a new right within the already variegate EU neighbouring right landscape (i.e. the protection for press publishers contained in Art. 11).

Read more

OpenMinTeD invites you to an all around TDM experience

The event ‘OpenMinTeD: Paving the way for text and data mining in science’ marks the official launch of the OpenMinTeD platform (www.openminted.eu, services.openminted.eu). We would like to invite you to join us for a live discussion on the way forward.  

To join the event, a registration via Eventbrite is required here.

Read more

Infographic: Text and Data Mining for Better Microbiology

As part of the OpenMinTeD project, INRA has been working on a text mining application dedicated to food microbiology. This infographic will tell you the story. 

Read more

What microorganisms live in my cheese?

With a tasty bite of cheese necessarily come some microbial strains. Some of them are well known, but the presence of others can puzzle researchers and they might want to investigate why they are there. A better understanding of microorganisms, their interaction and their adaptation to their environment are important issues for research and industry. It could help improve public health or develop innovative products.

Read more

Join our event ‘OpenMinTeD: Paving the way for text and data mining in science’

To make sense of  the huge amount of scientific text and data available, we need text and data mining (TDM). The European project OpenMinTeD has been paving the way for  TDM in science by working on an infrastructure for the past three years. We would like to invite you to join us for our event in Brussels on May 24th.  Learn about best practices in TDM, perspectives of different stakeholders, the GDPR and the future of TDM and OpenMinTeD.

Read more

Text mining 101

What is text mining, how does it work and why is it useful? This article will help you understand the basics in just a few minutes.

Read more

TDM Stories: How Linguamatics Uses TDM To Improve Healthcare

Dr Jane Reed is Head of Life Science Strategy at Linguamatics, a UK-based company which makes TDM tools to help companies in the healthcare and pharmaceutical industries. She spoke to OpenMinTed about how TDM is being used to speed up drug discoveries and treat patients, and gave a vision for the future of text and data mining.

Read the full interview below, or download a printable version to share with others.

Read more

TDM Stories: How Zalando Links Languages With TDM

Dr Alan Akbik is a Research Scientist at Zalando Research. He’s using text and data mining to create tools which can be developed in one language and then applied automatically to other languages. This is valuable for companies such as Zalando, which work in many different countries around the world.

Read the full interview below, or download a printable version to share with others.

Read more

TDM STORIES: A Text & Data Miner Talks About Analysing The Recent Past

Federico Nanni is a researcher who uses TDM to build collections of materials from large archives which can be used to better understand recent, historically critical events such as the rise of Euroscepticism as a consequence of the recent economic crisis.

OpenMinTed first spoke to Federico in early 2017 about his work. Recently we asked him for an update. Read the full interview below, or download a printable version to share with others.

Read more

KEY CONCEPTS AND AREAS IN TDM EXPLAINED – PART 6: DEEP LEARNING

It’s time for our final episode of this series of ‘Key concepts and areas in TDM explained’. This time Robert Patton of the Oak Ridge National Laboratories introduces Deep Learning and discusses how it can be applied in practice.

 

 

 

Read more

Key concepts and areas in TDM explained – Part 5: Knowledge Discovery

Ron DanielKnowledge discovery is the process of discovering new information. In text and data mining this happens for example by finding new connections or trends in a large amount of text and data. Ron Daniel is director at the Elsevier Labs. He explains  Knowledge Discovery and  Knowledge Representation in three short videos. 

Read more

GRNET presents OMTD workflow stack in the Athens Docker meetup

Presenting OpenMinTeD It was a great honour and opportunity to interact with the Docker community during the meetup in Athens on November 29th. More than 30 people attended our talk ‘A scalable, virtual, flexible workflow infrastructure in OpenMinTeD stack’. The talk covered the software stack responsible for executing Text and Data Mining (TDM) workflows on a distributed cloud environment. The workflow setup greatly overlaps with (but is not limited to) modern containerization technologies and especially Docker.

 

 

Read more

OpenMinTeD at DI4R

Early December 2017 (1st December), OpenMinTeD organized a session on Text and Data Mining in Open Science at the DI4R conference in Brussels (30 Nov-1 Dec).

Read more

Key concepts and areas in TDM explained – Part 4: Semantic Search

In the old days, if you would do a search in a search engine, you would get a lot of irrelevant hits that for some reason contained the keyword you used. Nowadays search engines give you much better results, because they put the keyword into context. This new way of searching is called ‘Semantic Search‘. Waleed Ammar of the Allen Intitute for Artificial Ingelligence explains semantic search, the challenges and the state-of-the-art in a few short video clips. 

Read more

OpenMinTeD presents licence compatibility tools at IP summer summit

Thomas Margoni and Giulia Dore of the University of Glasgow have developed a matrix and two fact sheets on open science and licensing. They presented the tools at the IP summer summit in Glasgow last June. The tools can help researchers, repository owners and many others with how to use open access licences in the context of text and data mining. Curious? You can access the tools through the links in this blogpost.

 

Read more

Join the call for TDM software and Knowledge Resources

Are you ready to develop and share an application or software component for text and data mining (TDM)? Or do you have knowledge resources that you would like to share and integrate with our platform? OpenMinTeD is looking for service providers, innovators, SMEs and researchers who can join and build on the platform! You can apply for this call until 26 January 2018. Winners of the call will be awarded a sum of money to implement their plans. You will also be part of an online hackathon to help you along the way.

Read more

TDM Stories: A Start-Up Founder Talks About Processing Knowledge

Mads Rydahl is the founder of UNSILO, a Danish start-up that applies machine learning to scientific publishing.

OpenMinTed first spoke to Mads in early 2017 about his work. Recently we asked him for an update. Read the full interview below, or download a printable version to share with others.

Read more

TDM Stories: The Structure of Papers

Iana Atanassova, Centre Tesnière – CRIT, University of Bourgogne Franche-Comté, is using Text and Data Mining (TDM) to study full-text scientific articles. Studying these papers can be a challenge, as they are usually in a format that is hard to process.

OpenMinTed first spoke to Iana in 2016 about her work. Recently we asked her for an update. Read the full interview below, or download a printable version to share with others.

Read more

TDM Stories: I Help Scientists Do Science

Daniel Duma is a PhD candidate at Alan Turing Institute and University of Edinburgh. He’s creating software that will plug into your existing word processor or text editor. The software will then use text and data mining to recommend papers that you should be aware of, you should read or that you would want to cite.

OpenMinTed first spoke to Daniel in 2016 about his work. Recently we asked him for an update. Read the full interview below, or download a printable version to share with others.

Read more

Key concepts and areas in TDM explained – Part 3: Recommenders and filtering

One of the things you can do with textmining, is discovering conceptually related items within a collection of text and data. Want to know more? Anas Alzogbi is research assistant and doctoral student at the University of Freiburg.  He explains Recommenders and Filtering in four short movies.

Read more

Key concepts and areas in TDM explained – part 2: Knowledge Representation

It’s time for the second part of ‘Key concepts and areas in TDM explained’. This time, Jevin West tells us more about  “Text and Data Mining” and “Knowledge Representation” in three short videos. Jevin West is Assistant Professor at the University of Washington and Co-ordinator of DataLab.

 

 

 

Read more

OpenMinTeD participates at FORCE2017 conference with two highly attended workshops

During 25 – 27 October OpenMinTeD participated in the FORCE2017 Research Communication and e-Scholarship conference that brings together a diverse group of people interested in changing the way in which scholarly and scientific information is communicated and shared.

Read more

Webinar Poolparty Semantic Suite now available online

The OpenMinTed project co-organized with Agroknow and the AIMS team a webinar entitled “The Text and Data mining functionalities of the PoolParty Semantic Suite”. The webinar took place on the 21st September 2017.

Read more

Deadline for call for content extended

The deadline for submissions for the call for content has been extended with one week to November 5th. Were you thinking about submitting a proposal, but too busy the last weeks? This is your chance! All information is available on the OpenMinTeD Open Tenders blog.  

Read more

Key concepts and areas in TDM explained – part 1

What are the benefits of text and data mining (TDM) and how can its practices be applied in science? We asked recognised experts in the field to introduce key areas and concepts in short videos. The videos will be released during the following weeks in a series of blogposts. Today we start with day 1: introduction to text and data mining. The videos will also be part of the TDM Knowledge Base . 

Read more

How TDM can unlock a goldmine of information

From September 6th– September 8th, over 200 people  with an interest in open science came together in Athens for the Open Science Fair. OpenMinTeD was one of the co-organisers, and also organised a workshop on text and data mining. The first part of the workshop showcased successful TDM initiatives. The second part was focused on content providers and was more OpenMinTeD specific.

Read more

Join the OpenMinTeD call for content

We are happy to announce that the OpenMinTeD platform for text and data mining is now ready to accept content. We invite publishers, repositories, libraries and other holders of scholarly publications to join the open call for content, by submitting a proposal by 29 October 2017 at the latest.

Read more

21 September: Join free webinar on Text Mining in PoolParty Semantic Suite

We are pleased to invite you to attend an upcoming Webinar on the Text and Data mining functionalities of the PoolParty Semantic Suite.

Read more

Textmining in the vineyard at Open Harvest 2017

In 2016, 30 people from important institutions all over the world came together for the first Open Harvest gathering. The goal was to set the stage for a global data infrastructure for agriculture and food. One year later,  Agroknow presented the OpenMinTeD application VITIS at Open Harvest 2017.  

Read more

FutureTDM and OpenMinTeD organise TDM workshop for research libraries

Many university and national libraries are exploring the best way to support researchers with text and data mining.  That’s why on July 5th 2017, OpenMinTeD and FutureTDM organised a workshop about text and data mining at the LIBER conference in Patras. 4 different speakers guided 16 participants through  the various aspects of TDM.

Read more

The amount of information out there is staggering

Tom Potok works at the Oak Ridge National Laboratory in Tenessee. He has been in the field of text and data mining for twenty years and worked on a wide variety of things.  Some of the biggest challenges are the amounts of information out there, and trying to figure out how the mind works with text.

Read more

Helping researchers to find new articles and opportunities

Benj Pettit works at Mendeley and works on text and data mining tools that help researchers to find new articles, collaborators etc. One of the special things about the Mendeley catalogue is that it is formed in a crowdsourced way.

Read more

Upcoming: Open Science Fair in Athens in September

 Open Science is a new research paradigm that is facing many challenges. In order to improve the uptake of Open Science, four EU-projects join forces and organise an event that will showcase critical elements, from infrastructures to policies and  new types of activities. Join us for the Open Science FAIR, September 6-8 in Athens, and get inspired.

Read more

Proceedings of the BioCreative V.5 Challenge Evaluation Workshop

Last April 26-27 the BioCreative V.5 Challenge Evaluation Workshop took place in Barcelona. The goal of BioCreative V.5 was to address some of the major barriers to the adoption and use of text mining tools, related to assessment, accessibility, interoperability, robustness and integration.

Read more

A start-up’s perspective on TDM

Mads Rydahl has a small start-up that applies machine learning to scientific publishing. Thanks to their deep partnership with Springer Nature, they can build value added services inside their platform.

Read more

On the role of a university library in the TDM landscape

Leiden city25 years ago, when Laurents Sesink was still a history student, his thesis on political internal relations included a lot of reading and tally marks. Back then he already thought “There must be a better way to do this”, so he built a database and started to get into informatics and digitisation. Now he is the head of the Centre for Digital Scholarship at the library of Leiden University.

Read more

We look at recommending articles to users

Daniel Kershaw works at Mendeley, where he uses text and data mining to recommend relevant articles to users. A lot of the issues in this work have to do with inconsistencies between data sources from different journals.

Read more

OpenMinTeD Partner presents VITIS pilot application at RDA’s IGAD pre-meeting

Conference and presentationThe 9th Plenary Meeting of the Research Data Alliance (RDA) took place in Barcelona, Spain, from 5 to 7 April 2017. The RDA Plenary Meetings constitute a major event where more than 4000 members from 100 countries come together to discuss, develop and promote data-sharing and data-driven research infrastructure through Working and Interest Groups. The Interest Group on Agricultural Data (IGAD) pre-meeting took place just a couple of days before the 9th RDA plenary meeting, from 3 to 4 April 2017 and attracted more than 100 participants from all over the world.

Read more

A two-fold approach to measuring impact

Mike Lauruhn works at Elsevier and uses text and data mining to help researchers measure their impact factor. More specifically, he wants to know if there is a link between using a database for Arabidopsis data and the likeliness of being cited.

Read more

New publication: A Framework for Collaborative Curation of Neuroscientific Literature

Women looking at computer screenFrontiers in Neuroinformatics has just released a new paper by O’Reilly, Iavarone and Hill. It describes a systematic framework to curate neuroscientific literature. This framework provides an easier and more reliable way to integrate published data into neuronal models. The work was done in the context of the OpenMinTeD and Blue Brain projects. 

Read more

OpenMinTeD partner presents VITIS pilot application at Agricultural University of Athens

presentation of VITISOn February 20th 2017, Agroknow had the pleasure to host a workshop at the premises of the Agricultural University of Athens (AUA). The workshop was organized together with colleagues from the Laboratory of Viticulture.

Read more

TDM and the reading revolution

Library

You will not catch Steven Claeyssens carrying a smartphone and he will always prefer a paper book to an e-reader. Yet he is the curator of digital collections at the National Library of the Netherlands. I interviewed him about his job, text and data mining (TDM) in the humanities and the role of libraries in the research landscape.

 

Read more

Providing insight into the structure of scientific papers

How is a scientific paper structured and how related is it to other papers? These are some of the things that Iana Atanassova of the University of Bourgogne Franche-Comte (Besancon, France) focuses on in her research. She uses text and data mining (TDM) to study full-text scientific articles. Studying these papers can be a challenge, as they are usually in a format that is hard to process.

Read more

From information society to knowledge society

Marc Bertin is assistant professor at the University of Toulouse uses text and data mining to study scientific papers. Text and data mining can help us move from an information society to a knowledge society, but not without open access to research papers.

 

 

 

Read more

Text mining for the discovery of small molecules

When scientists need information about the structure, name or properties of small molecules, they often turn to a high quality database called ChEBI. This database is largely curated manually and this process takes a lot of time. OpenMinTeD is working on a textmining application that can help to speed up the process, while maintaining the quality of the database. 

 

 

 

 

Read more

Text and data mining in history

Joris van Eijnatten is professor of cultural history at Utrecht University, The Netherlands. He has a fascination for numbers that not many historians have. Last year he was the research fellow for digital humanities at the National Library of The Netherlands, where he applied text and data mining to study the image people have of Europe based on newspapers. I interviewed him about text and data mining in humanities, his work and his personal romance with numbers.

 

Read more

Using textmining to spot innovation in biomedical sciences

shubhanshuWhat is the real novelty of a research paper? How do different researchers contribute to innovation? And does this change throughout their career? Shubhanshu Mishra of the University of Illionois uses textmining techniques to study the novelty of biomedical articles.

 

Read more

Learning software to systematically review articles

 Systematic review of medical research papers can lead to new knowledge and treatments of diseases. The existing software tools however, are very limited and often a lot of manual work is involved. Stephen Gilbert of Iowa State University uses artificial intelligence and machine learning to automate the process of systematic review.

Read more

Future TDM’s policy recommendations

logo-futuretdm-h150While discussions at the EU on copyright reform and an exception for text and data mining (TDM) are very much live, FutureTDM, a Commission funded project of TDM experts has, for the past year, already been gathering information, mapping the TDM landscape and listening to the wide variety of individuals and organisations involved in data analytics. The project has just produced the first in a series of reports, providing a range of stakeholders with recommendations to improve TDM uptake in the EU. This FutureTDM policy framework document sets out high-level principles and recommendations.

Read more

Studying interdisciplinarity

frederico-nanniFrederico Nanni was not always a text miner. He actually started out as a historian and then switched to digital humanities. During his PhD, he developed a method to detect interdisciplinary research, based on scientific abstracts. Now, he finds text mining fascinating and thinks more historians should learn how to do it.

Read more

Evaluating the impact of research

drahomira-hermannovaIt took some time for Drahomira Hermannova to see the value of her research topic, but now she thinks it is the best topic she could ever choose: using text and data mining to evaluate which research can change the world. Not only can this help scientists, it may change the way research is done altogether.

Read more

Text Mining for social sciences – tackling the challenges to make search systems smarter

9hi8ujmsdza-braden-collumIn the OpenMinTeD project, partners from different scientific communities are involved to make sure the OpenMinTeD infrastructure will address their needs. As regards the social sciences, a useful application for text mining is the improvement of literature search and information interlinking. To this end, three main challenges were identified: named entity recognition, automatic keyword assignment to texts and automatic detection of mentions of survey variables. This post gives an overview of these tasks and the progress of work so far.

 

Read more

I’m trying to help scientists do science

duma2Would you like to get more insight in the world of text and data miners? Daniel Duma is a PhD student at the Alan Turing Institute and the University of Edinburgh and he shares his story in a short movie. He is working on software that will recommend relevant papers to scientists writing papers.

Read more

Webinar on Text and Data Mining interoperability at the legal level

tdm-legal-level-webinarIf you want to do text and data mining in the EU, you run into a complex legal framework of copyright rules. During the OpenMinTeD webinar of November 23rd , this legal  framework, limits and opportunities were discussed with legal as well as non-legal TDM experts. Recordings of the webinar and the discussion are available online.

Read more

Things aren’t always what they seem: The PDF challenge (accepted)

Image CC-BY

There are situations where text miners might struggle with getting the textual data to perform the mining on in the first place. One problem for us is that most of scientific publications – especially in social sciences and humanities – are only available in PDF format, which is not suitable to be read and processed by computers. The OpenMinTeD social sciences work group accepted the challenge to work on this problem.

Read more

Sneak preview: the OpenMinTeD knowledge base for text and data mining

CCO license

CCO license

Are you looking for support or training for text and data mining? Then you’re at the right place! OpenMinTeD recently released a Knowledge Base, that will host open access support and training material. At the moment we are still in the process of uploading content, but you can already have a look.

Read more

White paper on community requirements for text and data mining

Image CCO

Text and data mining is important to different scientific communities, but what do these different user communities need to mine succesfully?  One of the aims of workpackage 4 of the OpenMinTeD project is to collect these requirements. This was done using  a combination of methods, including online surveys and focus groups. The results are summarized in the  ‘White paper on OpenMinTed Community Requirements’ that was finished last week.

Read more

Text mine millions of research papers with the CORE dataset

CORE dataset CORE is an aggregation service that harvests open access journals and repositories, institutional and disciplinary, from around the world. It offers one of the largest collections of scientific content via its Datasets, ready to be text-mined. We encourage everyone to use it as part of OpenMinTeD and beyond.

 

Read more

How the Future TDM workshop highlighted the draft exception must be improved for TDM to have a future in Europe

AcademicCircle-760x428For the legal geeks among us, it is now old news that the European Commission, after promising to modernise copyright, issued a rather unhinged and disappointing copyright review proposal aimed at creating what it claims to be a ‘well-functioning marketplace’.

Read more

Can Europe lead a data revolution in agriculture and food?

europeleadLet’s take a step to the near future.

A shared global data space for agriculture and food will propel the industry forward. Information will become available to all actors producing innovation.

Read more

Why Text Mining is often not Legal, but how it could be in the Future

ApprovedHi there, I’m Lucie Guibault, Associate Professor at the Institute for Information Law of the University of Amsterdam.

Over the past few years, I became increasingly aware of TDM as a research method in all fields of science and humanities. With the increase of computational capacity, of digital born information and the digitisation of collections, the use of TDM in research is on its way towards achieving tremendous societal and economic benefits. Think about all the new insights and cost savings that would otherwise not be possible. This means more scientific breakthroughs and a greater understanding of society.

Read more

Text and Data Mining Researchers present Studies at WOSP2016 workshop

IMG_7939_1On 22-23 June 2016, OpenMinTeD organised its third stakeholder workshop at the Joint Conference on Digital Libraries in Newark, just outside of New York City. The workshop, called “the International Workshop on Mining Scientific Publications,” was organised by the Open University for the fifth time (almost everytime in conjunction with JCDL) and featured speakers from OpenMinTeD, as well as speakers who presented their text and data mining research results.

Read more

LREC Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability

IMG_0729Our efforts towards improving interoperability in the communities of Text Mining (TM) and Natural Language (NLP) processing continue. OpenMinTeD organised a workshop on this subject at the International Conference on Language Resources and Evaluation (LREC) on 23 May 2016. Alessandro Di Bari (IBM) opened the workshop with a keynote on transferring ideas from the model driven approaches of software engineering to enhance interoperability in TM and NLP.  

Read more

Envisaging a Broader TDM Exception to Overcome the Pitfalls of Current Copyright Law in the EU

Copyright exception for TDMConducting TDM activities in the current legal context is very difficult. This is due to the unclear and uncoherent legal framework for copyright licences and to the highly fragmented landscape of copyright exceptions and limitations in the EU. In this blogpost, we’ll discuss the current legal context and what needs to be changed to open the paths for TDM in the EU. 

Read more

Mining Repositories: Assisting Researchers in their Text and Data mining Needs

IMG_1070

On 13 June 2016, the OpenMinTeD project organised its third stakeholder  workshop titled “Mining Repositories: How to assist the research and academic community in their text and data mining needs”. The workshop took place in Trinity College Dublin as part of the OpenRepositories Conference, and brought together repository managers from all over the world who are interested in text and data mining.

Read more

Berlin Buzzwords 2016: what was hot and what was not?

CkQvbQ-WEAApMGGThe seventh Berlin Buzzwords 2016, Germany‘s leading Conference on Open Source Big Data technologies, was held from 5-7 June at the Kulturbrauerei in Berlin. A very interesting venue for cultural events, under national trust protection, Kulturbrauerei is a spacious former brewery with a lot of courtyards and buildings.

Read more

e-Infrastructures in the language technology community get together

IMG_0738On 22 May 2016, OpenMinTeD held its second stakeholder workshop at the LREC conference in Portoroz, Slovenia. The workshop took place in the form of a roundtable, and brought together strategic players and stakeholders from the language technology community and neighboring areas. Stelios Piperidis (Athena Research Center / ILSP) led the discussion. Among the attendees were representatives from CLARIN-CZ, CLARIN-ERIC, OpenAire, ELDA and LAPPS Grid. 

 

 

Read more

Text mining in Agriculture: The AgroTagger Keyword Extractor

agrotaggerThe use of keywords is crucial for the description, organization, indexing, retrieval and sharing of research in every scientific field and agriculture is not excluded. However, manual annotation of research outcomes is time-consuming and error-prone so automatic methods for metadata annotation are always explored. AgroTagger is one of the tools facilitating the work of information and knowledge managers (among others) in the agri-food sector, by applying text-mining on top of agri-food research outcomes.

Read more

Text Mining projects in the Agri-Food sector

Can you text mine agricultural content?

tuscany-428041_1280“Absolutely!” is the answer that AgroKnow will give you. And they can prove it! AgroKnow is one of the partners in the OpenMinTeD projects who are already very active in projects which apply text mining technologies to the agricultural sector.

Read more

We are looking for researchers in frequent need of searching and accessing textual content

searchingAre you a researcher in frequent need of searching and accessing textual content? Does your research involve looking for information in repositories of publications, reports, patents, and other textual content archives?

Then we are looking for your input!

Read more

We are looking for developers of TDM-powered applications

developerDoes your company develop text-mining powered applications? Would you benefit from a platform that provides access to a variety of text mining tools and components, along with the possibility to examine their specifications and performance? Are you an application developer in need of integrating text-mining services in your software? Then we are looking for your input! 

Read more

We are looking for organisations that want to make their data available for text mining

data-1Does your organisation have tons of data that you want to make available for text and data mining? Would you benefit from an infrastructure that brings your data together with text and data mining tools? Are you a repository manager, a publisher, or do you represent any other type of content collection?

Then we are looking for your input!

Read more

We are looking for text and data miners

researcher in tdmAre you a researcher in text and data mining? Would you benefit from making your mining software widely discoverable and interoperable, and would you like to easily explore and evaluate the work of other researchers in your field?

Then we are looking for your input!

Read more

Join us at the 5th International Workshop on Mining Scientific Publications

In association with the OpenMinTeD project, The Open University organises the 5th International Workshop on Mining Scientific Publications (WOSP) at JCDL 2016

The workshop is organised by Open University and aims to give a useful overview of Text and Data Mining (TDM). The topics of the workshop are organised around the following themes:

Read more

Training Course: Mining Social Media Content with GATE

The 9th GATE training course will be taught this June, at The University of Sheffield, and we are looking for you to join us! GATE, or the General Architecture for Text Engineering, is a mature, comprehensive suite of tools for information extraction, natural language processing and related tasks that has been developed continuously since 1995 at the University of Sheffield. The course is open to industrial and academic participants of any ability or experience level.

Read more

Open Science: What does it mean and how do text and data miners benefit?

tokyo1

On February 29th  researchers from around the world gathered in Tokyo for the data sharing symposium “Data-driven Science – The trigger of Scientific development”. It’s been a place of vibrant discussion of opportunities and challenges brought by current trends, such as open science, data-driven research and big data. OpenMinTeD, which perceives openness as one of its basic principles, participated in this event.

Read more

Text Mining Patient Records: Extremely Complicated but Incredibly Rewarding

rewarding_smallAt the end of last year, I presented a webinar to the American Medical Informatics Association on clinical text mining and text engineering – applying text mining to medical records. This is not an area that we are concentrating on in OpenMinTeD, but it is still an area on which we should keep a watchful eye. There is a rapid growth of text mining over medical records, and it exposes issues and problems that we need to be aware of.

Read more

Behind the Scenes: Listening to our Stakeholders

hands-1167615_19202

The OpenMinTeD project is divided into different tasks. It is the task of Agroknow to carry out the important job of gathering TDM requirements from our stakeholders (OpenMinTeD’s future platform users and contributors), so that OpenMinTeD will build a TDM platform that meets the requirements of our platform stakeholders as good as possible. We focus on gathering requirements from four different scientific domains, represented by the following different communities.

Read more

Final Call for Submissions: Cross-Platform Text Mining and Natural Language Processing Interoperability

canal-642872_640 - kopieRecent years have witnessed an upsurge in the quantity of available digital research data, offering new insights and opportunities for improved understanding. Following advances in Natural Language Processing (NLP), Text and data mining (TDM) is emerging as an invaluable tool for harnessing the power of structured and unstructured content and data. Hidden and new
knowledge can be discovered by using TDM at multiple levels and in multiple dimensions. However, text mining and NLP solutions are not easy to discover and use, nor are they easy to combine for end users.

 

Read more

A Peek in the Future: OpenMinTeD’s possibilities for Scholarly Communication

notebook-1071774_960_720 OpenMinTeD aspires to create infrastructure that fosters and facilitates the use of text and data mining technologies in the scientific publications world. But what does this mean in practice? Take a look in the future with us, and discover some examples of what OpenMinTeD will make possible for scholarly communication!

Read more

The Future is All Mine: Mining Cultural Heritage Text and Data

Afbeelding5On 7 December 2015, the text and data mining projects OpenMinTeD and FutureTDM organised a workshop about the text and data mining challenges for cultural heritage institutions. This workshop took place at the DISH conference, a biennial international conference on digital heritage and strategies for heritage institutions.

Read more

Open Access & Text Mining: Moving Things Forward

Text_mining Text mining refers to “the process or practice of examining large collections of written resources in order to generate new information” (source). I am not an expert in text mining, but I understand that it is about applying specialized software/algorithms/techniques on existing textual information so that it can be read and analyzed by machines in order for them to extract more meaningful information for us, humans.

Read more

Text and Data Mining: Challenges and solutions from the publishers’ perspective

Afbeelding4On 11 November, OpenMinTeD and Europeana organised a workshop titled “Text and Data Mining in Europe: Challenges and Action”. The goal of the workshop was to bring together content providers (publishers, data centers, museums and libraries) who are open to make their data available for text and datamining (TDM).

Read more

Planning the software engineering tools for OpenMinTeD

testing_tools_selectionI’m Angus, and I lead the “Platform Integration, Testing and Deployment” workpackage for OpneMinTeD – or WP7 as it is affectionately known in Project-Speak. Our task in WP7 is to take the services that have been designed and created in OpenMinTeD, and to deliver these as a whole, so that they can be deployed as a running system. But what tools are needed for this?

Read more

Outcomes of the OpenMinTeD Interoperability Workshop

interoperability2

On 12 November, OpenMinTeD’s  specification Working Groups (WP5; task 5.2) met for the first time in person. This one-day workshop was attended by 30 participants with wide-ranging expertise in the many faces of TDM interoperability (both project-internal participants and invited external experts).

Read more

Infographics of the 11 Nov OpenMinTeD workshop on Text and Data Mining

LIBER_1_intro_v4The most text-less debriefing: All outcomes of the 11 Nov OpenMinTeD Workshop on Text and Data Mining in just 5 info graphics!

Read more

Search mission on researchers’ needs for TDM: The Social Sciences Community

HumanitiesFor the envisaged creation of an open infrastructure for text and data mining, it is one of the essential first steps to identify our target users that will eventually make use of the tools and services provided. We need to get to know the needs, benefits, challenges and barriers of text mining in each community. To this end, several use cases have been identified, which fall under 4 thematic areas of interest to the project.

Read more

Towards efficient sharing and discovery of foodborne diseases information

foodmorneA high level meeting on Open Data in Agriculture took place on 28 September 2015 in Amsterdam, Netherlands. The participants of the event represented organisations like the Global Forum on Agricultural Research (GFAR), the Food and Agriculture Organisation of the UN (FAO), Land Portal Foundation, Wageningen UR, Open Data Institute (ODI) and Institute of Development Studies, UK (IDS). 

Read more

From Mess to Machine: Making Text and Data Mining Services Interoperable

Source: ec.europa.eu

Source: ec.europa.eu

Hi, my name is Richard. I am leading the “Interoperability Framework” work package (WP 5) in OpenMinTeD. Today, I am blogging about our activity in the “Interoperability Specification” task, which is one of several tasks within the work package.

Read more

FutureTDM has kicked-off

futuretdmOn 15-16 September, OpenMinTeD’s sister project FutureTDM kicked-off during a two-day meeting in Vienna. FutureTDM represents an opportunity for stakeholders to shape the content mining landscape in the EU. 

Read more

Leave a comment

You must be logged in to post a comment.