Corpus Builder for Scholarly Works

Dig into the biggest set of open access scientific publications, build your corpus and mine it!


OpenMinTeD has set up a mechanism which provides access to scholarly and scientific content from a wide range of sources (publishers, repositories, journals, etc.) and enables users to search and select among them the ones that interest them for mining; the selection is based on a faceted search or a google-like natural text query based on the harmonised metadata descriptions of the documents (e.g. publication year, keywords, domain, etc.) while the selected documents form together a collection or “corpus”. The OpenMinTeD registry provides content made available by two major content aggregators, OpenAIRE and CORE, and other open access content providers.


Researchers, SMEs, citizen scientists, anyone interested in TDM of scientific publications


The OpenMinTeD Corpus Builder is unique in that it exploits the largest available Open Access scholarly content brought together in one source and described in a harmonised way; thus users can easily select subsets with a single query, and get direct access to the full text of the selected publications, instead of having to go through the APIs of various content providers one by one, pose differently formulated queries to match the provider’s system each time in order to collect the set of publications that fits their research topic. They can then go on to process this dataset with one of the TDM applications offered by the OpenMinTeD platform.



EC funds (H2020 grant 654021 for the OpenMinTeD project) & National funds for the GRNET cloud infrastructure on which the platform operates