Open Science: What does it mean and how do text and data miners benefit?


On February 29th  researchers from around the world gathered in Tokyo for the data sharing symposium “Data-driven Science – The trigger of Scientific development”. It’s been a place of vibrant discussion of opportunities and challenges brought by current trends, such as open science, data-driven research and big data. OpenMinTeD, which perceives openness as one of its basic principles, participated in this event.

The meaning of Open Science

But what does “open science” really mean? The very idea of science assumes openness, as it could only progress through open communication of research results. However, in practice wide access to research publications and data turns out to be problematic. This is because of copyright, costs and additional conditions imposed by the authors. Fortunately, the Internet makes sharing information easier than any time before, and serves as opening door for science as well. This however provokes us to ask a question – what should be made available? In what way could the public benefit from open science?

There are three major constituents of open science:

  • Open access to producing knowledge, enabling anyone to share results of his/hers research, despite not being a full-time academic. This trend is also known as citizen science.
  • Open access to scientific publications, so that the wide public can read and benefit from scientific articles. This has been problematic till recently, but is changing with rising open access
  • Open access to scientific data, which lets other academics to verify researcher’s claims by analysing not only the final results, but also the data they have produced in the process. Making this possible requires facing many (technical, legal, …) challenges, which is a purpose of the Research Data Alliance, whose plenary meeting took place after the data sharing symposium in Tokyo.

However, the way to achieve these goals is far from obvious. Which data should be available openly, and which should be restricted? How to permanently store large quantities of research data?  How to assure their high quality? How to comply with legal requirements? And where to get human resources for performing all these tasks?

OpenMinTeD working towards goals of open science for text and data miners

Within the area of text and data mining, the OpenMinTeD project works towards achieving goals of open science. We not only benefit from the elements of it (such as open access to publications), but will also promote open standards, practices and tools that enable the text mining community to share their results with the community and wide public. The area is exceptionally challenging from this point of view, as textual data are frequently legally restricted, of high volume and difficult to use for non-specialists.

tokyo6Barriers to Open Science

Scientists generally support the idea of open science, but in practice are often reluctant to share their data. Among the strongest barriers are: lack of trust (will my data be used correctly?), unnecessary burden (too much hassle to share the data), lack of credit (someone else benefiting from their effort) and lack of money (who pays for making my data available?).

Going forward, aiming for open

The issue of open science has already been recognised at the level of policy-makers – among the speakers at the symposium were representatives of the European Commission, OECD, US National Science Foundation and Japanese government. Their engagement lets to hope that the issue of open science will receive the support necessary to bring its principles into everyday practice of researchers. In OpenMinTeD we aim to join this effort by providing platform and infrastructure that encourage openness in the domain of text and data mining.

This blog was written by Piotr Przybyła, who is a research associate working on natural language processing at the National Centre for Text Mining at the University of Manchester.