Text and Data Mining Researchers present Studies at WOSP2016 workshop
On 22-23 June 2016, OpenMinTeD organised its third stakeholder workshop at the Joint Conference on Digital Libraries in Newark, just outside of New York City. The workshop, called “the International Workshop on Mining Scientific Publications,” was organised by the Open University for the fifth time (almost everytime in conjunction with JCDL) and featured speakers from OpenMinTeD, as well as speakers who presented their text and data mining research results.
Petr Knoth from the Open University started by welcoming everyone to Newark. With participants coming from all over the world the word ‘International’ in the title of the workshop was very appropriate.
The first keynote was given by Yuxiao Dong (University of Notre Dame). Yuxiao Dong talked about his work on the AMiner system, a network and database of author profiles. AMiner claims to have in their database over 130 million author profiles and over 230 million research paper records. The researchers working on it have been developing ways to link, bridge, connect and compare profiles, publications and other research entities. One of their main research goals now is to figure out how to extract and integrate semantics from different sources. This was a very valuable talk fueled by many technical details. It triggered quite a vivid discussion around the ways AMinor measures the size of their scholarly dataset, which can be summarised by Michael Kurtz’s immediate comment: “There aren’t 230 million publications in the history of the human race” which is backed, for example, by existing research work of Khabsa & Giles, 2014.
The second keynote talk was given by Michael J. Kurtz of Harvard-Smithsonian Center for Astrophysics and addressed the work around the Smithsonian/NASA Astrophysics Data System (ADS), which is one of the oldest web based scholarly information systems in the World. Today it contains metadata on more than 11 million articles, and the full text for 5 million articles, including nearly every refereed article in physics, astrophysics, or geophysics. Its roots go back to almost a quarter of a century. This talk highlighted the technological and infrastructure gap between some of the scientific communities, for example, while connecting datasets to research papers is still an issue in social sciences, Kurtz claims ADS has provided this functionality since mid-90s. Kurtz also discussed how ADS benefits from applications of text and data mining, such as by mining of usage logs; the development and implementation of new bibliometric measures for papers, people, and organizations; semantic tagging, and the creation of links to external data sources; machine learning and text classification; recommender systems; real-time network analysis; and various related user interface issues.
Throughout the two days of the workshop, a large number of interesting paper presentations were given. A long paper presentation was given by Shubhanshu Mishra (University of Illinois) who did a datamining study on novelty in biomedical literature. He found that researchers publish less new concepts as they age. However, researchers might publish their most novel text at any time in their career.
Another long paper was presented by Drahomira Herrmannova (Open University) who demonstrated the strengths and limitations analysis she did on the Microsoft Academic Graph, which contains over 120 million papers. She hopes her research will be valuable to those deciding whether to use the Graph in their (datamining) research. Another long paper was presented by Robert Patton (Oak Ridge National Laboratory), who talked about alternative ways of measuring scientific impact of a research article. He emphasized the need for better metrics that leverage full content analysis of publications.
Invited talks were given by Stelios Piperidis (Athena Research Center) and Peter Mutschke (GESIS), who laid out the challenges and potentials of the OpenMinTeD project.
During the workshop, the participants and OpenMinTeD partners were active on Twitter, resulting in lots of interaction with people not attending the workshop.
For a full list of speakers and their papers, please go to https://wosp.core.ac.uk/jcdl2016/
The (unpublished versions) of the papers can be temporarily found at https://drive.google.com/folderview?id=0Bz6QWs4w8jPUSWpGM1gtbDhvNTg&usp=sharing .
Many of the speakers also participated in video interviews about their text and data mining research. These interviews will be published in the summer months on the OpenMinTeD website. So stay tuned to hear more about exciting text and data mining research, from all over the world!
This blog post was written by Hege van Dijke (LIBER Europe) and Petr Knoth (Open University.