We believe that novel tools and applications based on text and data mining will enrich the content that we host and improve the way we interact with the scientific literature. Therefore, we develop Europe PMC as an open innovation platform, enabling contributors, such as text miners and developers, to showcase the outputs of their work.
Maria Levchenko is a community manager for Europe PMC – a global database for life science and biomedical research literature, a partner in PubMed Central International, and the designated repository for the open access publication mandates of 27 life science funders.
Marta Hoffman-Sommer: Europe PubMed Central (Europe PMC) is a repository for publications from the life sciences domain. Would you explain shortly what are the main goals of Europe PMC? How does it differ from PubMed and PMC? Are there any specific advantages for a European-based researcher to use the Europe PMC search site?
Maria Levchenko: The mission of Europe PMC is to build open access, full-text scientific literature resources and deliver world-class literature services. We believe that novel tools and applications based on text and data mining will enrich the content that we host and improve the way we interact with the scientific literature. Therefore, we develop Europe PMC as an open innovation platform, enabling contributors, such as text miners and developers, to showcase the outputs of their work.
Europe PMC is part of the PubMed Central International (PMCI) initiative. Together with PMC USA and PMC Canada it constitutes a network of digital archives that provide free access to published peer-reviewed biomedical and health research literature. All nodes share their locally deposited manuscripts within the network, while offering different functionalities to their users. Europe PMC combines the power of both PubMed and PubMed Central as a one-stop shop for both abstracts and full-text articles that can be accessed through a single search interface. In addition to that, Europe PMC hosts a large variety of content, such as books, patents, biomedical theses and clinical guidelines. In addition to 27 million PubMed abstracts, Europe PMC covers additional sources, such as Chinese Biological Abstracts and Agricola records, bringing the total number of abstracts to 32 millions. What distinguishes Europe PMC from PubMed Central are several novel features, including advanced text and data mining tools, integrated ORCID IDs - unique identifiers from ORCID foundation to distinguish academic authors, and a Grant Finder for accessing grant information from 27 international life science funders supporting Europe PMC. New developments are highlighted on the Europe PMC blog.
The content of Europe PMC is not limited by geographical location and includes scientific literature from anywhere in the world. We hope that life science researchers all over the globe can benefit from our services.
Are there any connections between Europe PMC and OpenAIRE (or other European e-infrastructures)? Is there any exchange of data and/or metadata going on?
Europe PMC is the largest data provider for OpenAIRE, supplying more than 3.8 million documents. OpenAIRE also utilizes the open API and public web service from Europe PMC to identify FP7 and Horizon 2020 funded research and to gather the associated metadata. Finally, the External Links Service provided by Europe PMC enables OpenAIRE and similar infrastructures to link to Europe PMC records from related resources, such as full texts of articles in repositories harvested by OpenAIRE.
What is the content acquisition policy for Europe PMC? Can any author from the life sciences domain - who wishes to make their work more visible - individually deposit their article in Europe PMC? If not, what requirements should he/she meet and why?
There are several routes for content to be added to Europe PMC. Fully participating publishers deposit the complete contents of each issue or volume, while a selective deposit option is available for hybrid journals that publish a subset of articles open access.
Europe PMC Funders' Group organisations mandate that published research, arising from the research grants they award, must be made available through Europe PMC, typically within six months of being published. Any researcher supported by at least one of the Europe PMC Funders can submit the final peer-reviewed author manuscript for inclusion in Europe PMC using the Europe PMC plus deposition service. Some publishers will deposit the peer-reviewed manuscript for free on behalf of authors for the articles acknowledging funding from the Europe PMC Funders.
For all life science authors, the easiest way of ensuring visibility for their work is simply publishing in an open access journal participating in PMC, which automatically makes the full text of their article available in Europe PMC. A list of journals which signed a PMC participation agreement can be found here.
How does Europe PMC add value to the scientific literature it presents?
We see literature as a bridging mechanism for wider research infrastructure, combining all associated information and helping to transform it into knowledge. To actualize this vision we focus on three major directions: author services, data integration, and text-mining.
Europe PMC works closely with ORCID foundation to ensure credit attribution for authors. You can search the resources by ORCID ID, to find papers by a particular author. Europe PMC provides a tool for scientists to add their published works to their ORCID record, and to date almost 3.5 million articles have been claimed by more than 350,000 biomedical researchers. We also generate author profiles for researchers with ORCIDs, with citation and publication graphs, showing how many articles were published open access. Finally, we provide links to related resources – alternative metrics, post publication peer reviews from Publons, or lay audience summaries on Kudos and Wikipedia, enabling researchers to show their impact in a number of ways.
Fostering connections between scientific data and literature is a big part of our work. Publications in Europe PMC are programmatically linked to relevant records in a number of databases, including Uniprot, European Nucleotide Archive (ENA) and Protein Data Bank Europe (PDBe), with the list constantly growing. Information discovery is facilitated by directly linking out from biological entities and data citations in the text identified with the help of text-mining approaches.
We have developed a SciLite text mining tool to support scientists and database curators in their literature research. SciLite highlights text mined biological terms, displaying those annotated entities as an overlay on scientific articles in Europe PMC. Annotations are linked to the corresponding data resources, allowing the user to locate the underlying data in a straightforward way. SciLite makes it easier to scan articles for key concepts and helps to quickly grasp the essence of an article.
Europe PMC links from the scientific literature to numerous curated medical and biological databases as well as through DOIs to cited datasets. These datasets may reside in uncurated data repositories, relying on author self-deposit (e.g. Zenodo, Dryad, Figshare). Do you know how often these kinds of datasets are cited? What are your predictions on the future of data repositories that accept uncurated datasets - will they play a significant role in life sciences?
Uncurated repositories are often tailored for data provenance, in contrast to curated databases that structure data for re-use. This can result in different citation rates for curated and uncurated data. Currently, there are significantly more articles in Europe PMC linked to PDBe records, compared to those containing Dryad links (98 000 vs 11 000). Only time will tell whether uncurated datasets will pick up speed with regard to citation.
However, uncurated repositories are indisposable, when it comes to new data formats that cannot be easily accommodated by the current structured archives. This lack of structure comes at a price: as the amount of biological data keeps growing exponentially, it becomes increasingly fragmented and scattered through different places. One database addressing these challenges is BioStudies, a new data service at EMBL-EBI which acts as a data container consolidating all the data from a particular study and making it easy to find and reuse data. It links to datasets in established repositories, while also hosting unstructured data. Such an arrangement is especially useful for multi-omics experiments, where different types of data can be produced. Europe PMC links to Biostudies records from scientific papers and provides input in the form of text-mined supplemental information and accession numbers. We believe that Biostudies due to its focus on life sciences, as well as its flexible structure, allows to adapt to community requirements better than a repository that caters to all domains and disciplines.
Do you think the way PMC International functions would be a good model also for other research areas (eg. humanities or social sciences), or does every research community need to develop its own model of functioning for literature databases?
PMC International has adopted a system initially designed for genomic data providers. For instance, the European Nucleotide Archive of the European Molecular Biology Laboratory (EMBL), DNA DataBank Japan (DDBJ), and GenBank at NCBI form the International Nucleotide Sequence Database Collaboration. All three databases routinely exchange deposited data, while offering different interfaces and functionality that is best suited for the scientific community they serve. Such a system ensures archive stability and safekeeping of the stored information, at the same time allowing to enrich deposited content with local guidelines and related resources. We believe that this arrangement has its benefits for the diverse stakeholders that are invested in the research cycle, providing flexibility of choice for a preferred resource.