It All Leads to Open Science

So, you are of good hope then? I wanted to ask you about the reception of Open Access in the academic circles in Poland. On one hand, there are EU recommendations which seem a bit like bureaucratic nagging. Researchers work elsewhere, and may not want to implement the ideas of civil servants. On the other hand, you are a prime example of how opening one's own publications can bring advantages, and not just abstract ones, like intangible social profits on a very wide scale, either economic or academic, but very concrete opportunities for the researcher. The researcher's name, their academic background becomes ever more recognised, the citation index increases. If we apply specially designed mechanisms to make it all easier, it would completely change the landscape of science, both in terms of access to publications and in terms of evaluation. Do you think Polish scientists are aware of those potential advantages?

It depends on the person. On one hand, there are areas in science which have no problems with that, and researchers already do publish in the best journals, which are properly indexed. The opposite applies mostly to social sciences and humanities. They are often not used to this, and mostly publish in book form. Their academic circle is pretty closed, watertight. As a result, we have no idea what is happening, how many people read those works, etc. Lately, I came into a bit of a shock when I heard one of our most prominent philosophers, from one of the best universities, speak on the subject. When he heard that his book was to be published in electronic form, he immediately withdrew his consent. So now, I'm asking – does he only want the book to look good on a shelf? Or would he like people to read and discuss it? Because if it's the latter, then the book's form does matter. A printed text would not be read by quite as many people.

I'm interested in a lot of research myself – sociology, history, etc. I would gladly read a lot of papers, but I have no time to go to a library. Look at what's happening at universities – very rarely do you see a young researcher in a library. I remember an older professor from New Zealand who used to work in our institute for a dozen years. He always talked about the importance of paper and libraries. Myself, I have never seen him in a library, he was forever in front of a screen reading from arXiv. So – people say one thing and then do the other. They read from the screen, because it's convenient. I read everything on my tablet, because it's convenient and because I wouldn't be able to carry that many books and papers. My shelves are still crammed with books and photocopies. Even if I have something there, finding it on the net is still faster. The speed of reaching information, the accessibility – that's the main difference. I think even the people who have great barriers can still see that it all leads to open science.

Moreover, there are several very far-reaching ideas concerning social sciences and humanities. There are large European projects, such as DARIAH. The idea is to create a common, European digital space for the arts and humanities. There are a lot of interdisciplinary projects. People tend to get wrapped up in their own specialisations, while the opposite should be true. We should cooperate, reaching various sources, texts, video documents, audio documents, etc. Various things in electronic form, which we cannot have in print. They will allow specialists to work together, creating new quality. An initiative group in Poland has recently written a paper on how those platforms for social sciences and humanities could help to integrate all science areas at the European level. We really have much to be proud of, but cannot express it properly. There are foreigners writing on the history of Poland, because our history books are not published in foreign languages. The key goal is to share this heritage. If we start creating open journals, they can be written in Polish, but they should at least have abstracts in congress languages, especially English, so they would be indexed properly.

The information on our science and whatever is happening with it, should be widely available. Good search engines are the key. There is a number of ideas on how to approach this. One of the leading American universities, Rice University, had a project called Connections. The idea was that people would write modules – like a single lecture – put them into a repository and then anyone could use them, for instance, put it in a course book. They became a huge success – they print books on demand. It is actually very popular now. This could revolutionize the whole approach to creating textbooks, journals and specialized works. But the key is a good search system with decent scaling. It is easy to put a hundred works in a repository, but when the hundred becomes tens of thousands – then organisation is the key.

Open Access, in its basic understanding, is about abolishing the technical and financial barriers. So, at first glance, it's about making publications freely accessible on the Internet. But it is a very basic understanding of this concept. It can also be understood as abolishing legal barriers. What is it really about? When the legal barriers are removed, it becomes possible not only to absorb a given material passively, but also to reuse it, as we see fit. Text A could be included in a set of texts that are then automatically processed – what we call “text mining”. It could also be used for educational purposes. There are no limitations resulting from traditionally understood copyrights. Now I would like to ask this: does the Ministry consider both of these aspects – traditionally termed “gratis Open Access” and “libre Open Access ” – equally important?

The Ministry cannot tell the academic circles what to do. We have to discuss it with a wider audience to sort out the moods. I think it is good to take a step forward based on existing materials. In some areas, before real-time communication, people kept doing the same things over and over. It all went in circles, instead of going forward. The idea is to use the material available. As Isaac Newton put it: “If I have seen further it is by standing on the shoulders of giants”. The “giants” might be an exaggeration, but we do use a lot of old material. For instance, the pictures I find on the web can later be used in publications. Quite recently I wanted to use some neurobiology atlas with pictures of the brain. I wrote to the authors, and they told me that each slide I use in my presentation will cost me $90. A single slide! This one lecture would have cost me thousands because I needed a lot of pictures. On the other hand, materials such as pictures are pretty common and make our life easier. People upload their video lectures, share their illustrations, etc. We build upon what we have and this gives us progress. It is sometimes a mixed blessing. I get a great idea, try to check it out, and learn that somebody else has already done it. In the past, I would have started to develop it myself and probably lost years of work. It's better to know immediately.

I think both roads have advantages, but sharing everything causes intellectual property to become blurred. I'm not that attached to my ideas, I have too many to keep track of. Sometimes I would toss some ideas out there, forget about them, and later discover people developing them, not remembering where they came from. Tough, I say. We are working for the common goal, not just for our personal credit. But I know that approaches tend to differ. People who are still developing their careers would like to get credit for their creative effort. So, I think the limited access – gratis Open Access – would also be popular. But everybody should specify how much of their work can be reused by others. But then the libre OA gives an opportunity to reuse works, with full credit to the author. There are benefits to this too, mainly citations. We toss ideas, someone develops them, and our pioneer work resurfaces. But then the citations stop, because the works become so advanced. Nobody reaches back to the original sources. There are people who are wrongly considered pioneers – and they aren't, it's just that nobody investigated far enough.

I think we can talk about Open Access as part of a wider initiative that is Open Science. In this area there are talks, not only of OA articles, but also open data, open peer review or even  so-called “open notebook science”.  While open data is ever more present in science funding, like in the pilot programme in Horizon 2020, the other two ideas – open peer review and open notebook science – are not widely practiced. How do you evaluate the chances of success for these initiatives?

I think the road will be long and rocky before they really catch on. It is discussed on many forums where data is collected. Open data is very useful if you want to hold a contest. A couple of years ago I held a contest myself. It was about eyeball movement analysis, conducted on people with various illnesses. The data was collected by psychiatrists, and we held a very interesting contest. The people analysing those signals were able to extract very compelling diagnostic information. But data processing is very tedious. Another thing we did with our colleagues from the US was to prepare hospital discharge cards, which could be generated automatically, so the patient would know what had been done to them, and how much it had cost. Three companies made their own annotations and we prepared a final version. Even processing textual data is very labour-intensive. The data on neuroimaging is not easily shared. Every equipment is specific – sometimes an electrode does not contact properly, or a channel does not work.

There is a lot of talk on how this data should be shared. The idea of big data is becoming ever more popular. The problem is that most researchers are quite egoistical, they only ask: “What do I get out of this?”. If I create some data, then I get cited often. Now we have whole repositories for training algorithms, machine learning. A whole set of data is shared and it may contain some sort of information – perhaps, how to tell two illnesses apart. Two sorts of cancer, etc. Now, people will try to analyse this data, someone will have a good idea. A lot of people try, and the folks who collected the data get cited. There are some benefits, yes, but sometimes it's terribly arduous to prepare the data. So the whole issue is moving forward rather slowly. When it comes to, say, electroencephalography, in Warsaw, there is a group in neuroinformatics who have been doing this for years. Some of the data comes from contests, but not much. Accessing data is difficult for them – they need to know what equipment was used, was there much noise, artefacts, other things. But in broadly understood science, where things can be done together... If it's completely open, it might cause some trouble. Many people do not show their findings until they're published, because in some areas there's a lot of competition. Elsewhere, they're a bit more willing to share. But with areas involving innovation, the solutions close to being implemented by the industry... The industry would not like them to be distributed. In drug research, there is really fierce competition and all the findings are confidential. Even the authorities who later allow certain drugs to be put on the market have trouble getting full information on them, because drug companies are so determined to protect data from competitors.

There are a lot of issues stemming from intellectual property rights, but there is progress in this area. People are trying to cooperate. In 1994 I witnessed a final stage of a 5th generation project – large, intelligent computers made by the Japanese. And the conclusion was that the era of big projects, where a couple of companies have created their own institute, must end. That virtual, dissipated research groups which cooperate using the right tools will do the same job more efficiently, without building any central structures. In our European projects, there are talks of opening virtual research institutes – maybe in Poland as well. The Foundation for Polish Science is engaged with this, maybe the groups working in such units will use open notebooks. But they will more likely use them within their closed circuit. Unless it would be research requiring wide cooperation – then we'll try to open it as much as we can. There is room for improvement on both sides.

I would like to ask you one more question about infrastructure. Implementing Open Access is not only a question of passing regulations, is it? It is vital to provide an infrastructure that will allow researchers to work efficiently. Real action steps. Moreover, they should not only be consistent at a national, but also European, or even global level. So – how do you see the current status of this infrastructure in Poland right now? What needs to be done? What has been done?

Certainly a lot. There are a lot of  projects which need coordinating, and that's a job for the Ministry. Many universities create repositories without agreeing on standards, and that may cause trouble later. It is an ailment of all big IT projects: everyone wants to have things immediately, and the end products are not always compatible. But we are building these systems on a larger and larger scale. We are building central systems, like POL-on, which contain information on all publications and what's happening at universities, what equipment is used. There are large data repositories appearing that are to be maintained by the Ministry of Administration and Digitization. There are numerous platforms for repositories and journals, and we will try to encourage them to abide by certain standards. The biggest endeavour is SYNAT, a project coordinated by the Interdisciplinary Centre for Mathematical and Computational Modelling. I hope it can be used to achieve much on a central level, and the local efforts can be integrated top-down. It will require a lot of agreements between creators of local and central platforms. Luckily, there aren't that many. But the numbers are growing, and there are the European projects for the social sciences and humanities like DARIAH infrastructure.

The infrastructure will need to be pretty advanced, because their functionality is very innovative. Not only provides tools for cooperation, access to indexed and scanned sources, but also opportunities to make science. Like what Google did with culturomics. It's very interesting – with so many scanned resources, you can follow a lot of trends. When did certain phrases originate, how did they change, etc. It's very useful to linguists and historians. Broadly speaking – a wide access to information appeared, because a lot of books had been scanned. And this includes old books, which are no longer copyrighted. It caused a completely new field of study to appear – a discipline called “culturomics”. The Institute of Literary Research of the Polish Academy of Sciences created its own Digital Humanities Centre – I think digital humanities have a great opportunity to develop now. It's not only repositories, there is also automatic analysis of various links and connections.

There are new opportunities that emerge thanks to artificial intelligence systems, speech-to-text tools that allow to archive radio auditions, TV programmes, etc. Also: new possibilities of analysing natural language, like the CLARIN project done by the Wrocław University of Technology. There are different ideas and projects in different circles, but if they are to cause some deeper change, they should be consistent with one another. I hope that the Centre for Open Science will play a leading part here, helping to integrate those local projects.

To wrap up, I would like to ask you about Open Access in a global context. Open Access is not only limited to Europe or North America. In every part of the globe, there is at least one institution that implements Open Access. There are many solutions – different countries adopt different policies and do not always agree on how particular solutions should be implemented on a national level. It is different for Denmark and, say, Argentina. Does any of the international solutions appeal to you in particular?

There is a series of European projects on Open Access – PASTEUR4OA, Open Access Policy Alignment Strategies for European Union Research. These are various European endeavours. The project only started in February this year, and it is to last for three years only. There is SPARC that works for Open Access in Europe, and we even have our own representative there. It's Bożena Bednarek-Michalska, Deputy Director of the UMK library, she's very active in the Open Science domain, and organises a lot of events. There are many ideas on how to unify the open structure on a large scale. We, as the Ministry, try to portion our funds rationally, trying to direct them with maximum benefit for the Polish science and the society in general. But we are not omniscient – these issues need to be discussed by the academic circles, not just in Poland, but within all those other organisations. I hope we can reach a consensus and start building common platforms. Without them, everyone will end up with something incompatible. Then it would be hard to build a search system that would encompass it all.

Minister, I would like to thank you for a very interesting conversation.

I would like to thank you too.

 

The interview was held in Warsaw, 28th August 2014.

For Polish version of this interview click here.

Additional information