Marginally Interesting: Peer Review and NoSQL

Disclaimer: This post definitely falls into the tl;dr categories of posts. I’ve been collecting these ideas for quite some time now, and somehow this post got longer and longer. Anyway, it is a complex topic.

Ever since I started to work in an academic environment back in ‘96 (working as a student in Rolf Eckmiller’s neuroinformatics group), the peer review process has always been a big topic. There were always some people complaining, discussing possible ways to improve it, or dismissing the whole idea of peer review at all.

The interesting thing is that very little has changed since then. If we look not only at peer review but the whole scientific publication landscape, you can see a few significant changes. For example, open access is much more real than it used to be, and there are more scientific journals which let authors keep their full copyright and the right to republish the papers on their webpages, etc.

These changes are all important, but what I find curious is that the general peer review process hasn’t changed at all. The process to get published or accepted at a conference is still the same: You submit your paper to some board where it gets handed to two or more reviewers whose identity is not revealed to you. Based on the verdict of the reviewers the action editor/program chair decides what to do with your paper.

I won’t repeat all the things which people perceive as being broken with this system. Let’s just say that the process has a high error rate, both false positives and false negatives (a.k.a. the “bad review problem”), it can take very long for a paper to get published, and the workload on the reviewers is pretty high.

Still, nothing has changed. Is this only because we sort of bring this problem upon ourselves (as opposed to the system being forced upon us by some external agency)? Or is there a deeper reason?

A Closer Look at Peer Review

I think the main reason why peer review is so resilient is that it is already a quite elegant partial solution to a number of interwoven requirements. In other words, just like democracy, it is an imperfect system but the best we have discovered so far.

In this whole scientific publication business, there are a number of stakeholders involved:

Science as a whole wants to progress to solve the smaller and bigger mysteries of life, the universe and all the rest. Science needs the publication system to be efficient, fair, and open, such that information can be distributed quickly and without bias.
Researchers want to have fun researching, but also need to build up a reputation to keep doing so. For this, they need the publication system to build up a track record of their work.
Researchers also need to have access to the works of others, to know what has already been done, which problems have been solved, and so on. The publication system is basically like an enormous, ever expanding library of knowledge.
Universities and funding agencies need the publication system to asses the scientific output of researchers for hiring decisions and to explain to tax payers how and why the money has been spent. Peer review is a very handy way of assessing scientific output. You can use it to basically just say “I don’t know exactly what they have been doing (and it’s probably not even practically relevant for another decade or so), but at least these other researchers said that it’s good.”
Publishers mainly want/need to make money (and probably also have a name in the whole scientific endeavor).

What’s important to understand, is however how peer review addresses all of these problems at least partially:

Exchange of information is more or less efficient, fair, and open. It could be more efficient, but the publication lag is still on the same order of magnitude as the actual work. It’s not like science is already five years ahead of a huge backlog of publications (at least I hope that’s not the case…) It is fair, because a good reviewer is bound by a scientific code of ethic to be a fair and unbiased, and it is open because everyone can submit something (as opposed to a closed club where you first have to become a member to get a chance to publish.)
Researchers get an excellent standardized measure of scientific output. A published journal paper is something nobody can take away from you. Not only is a published paper an important step towards tenure, it is also something everyone agrees on. On the other hand, peer review gives you a level of filtering such that the amount of new results is just large enough to be manageable.
Universities and funding agencies are also happy, because they have a solid, generally accepted measure of scientific productivity, which even laymen can understand.
Publishers can get their share by building a strong brand, becoming a journal with a high impact (while having researchers doing the actual peer review work for free, but that is another problem).

In summary, peer review is an okay solution to a complex problem, and whatever solution you propose to replace it has to cover all of these aspects as well.

You can’t ignore the complexity of the problem

You’re probably wonder when I’ll come to the NoSQL bit of this post, but before we get there, let’s briefly discuss how common alternatives fail because they do not address all of the above aspects.

For example, a common approach is to say that we should replace this whole process with a social media site around publications. Let’s just call it SciNet for now (and I know that already exists, but it’s really hard to find something in that namespace which isn’t already taken). “Likes” or “Recommendations” would work as filtering, connections between users give people structure to navigate, or to form “Web of Trusts”, and so on.

This idea has some appeal, but it neglects the aspect of building a track record and giving an objective measure of scientific output, because you’ll have a hard time explaining to your funding agencies that that non-peer reviewed paper of yours is a solid piece of work because it got 1.5M “likes” on SciNet. I’m not saying that this probably cannot be solved, but you can’t just copy existing concepts, and you would also need to invest quite an amount of lobbying to convince the universities which hire professors and the funding agencies which pay for your research to accept these measures.

Other approaches focus mainly turn-around times and open access, proposing some central server which is a mixture of a preprint server and a perpetual archive. Such systems don’t really address the filtering aspect, and also don’t deal with the main problem of how to improve peer review.

Finally getting to the NoSQL part

So from a higher-level, we have a situation which is pretty common in engineering: We have a well-tested and established piece of “technology” for a complex problem. It’s been around for quite some time now, and it shows. Somehow, it hasn’t kept up with the acceleration of communication which the Internet brought about. We’ve seen how fast information can be exchanged, and we’d like to have that kind of quality for our professional scientific exchange as well.

Of course, there is still room for improvement. People could just work harder to write better reviews on time, action editors could press reviewers harder to give good reviews. Already communities have found ways around the long turn-around times by moving to conferences (like computer science) or preprint servers (like physics). Conferences are actually an interesting example, because they play a quite different role in computer science and mathematics. In CS, conferences have become as important as journals, which is problematic because the review process is quite different (as there is really no way for a revision). In mathematics, conferences are much more informal. Often you can apply with just an abstract. That way, conferences function mostly as a platform for exchange, and less as an outlet for publications.

But in order to change the problem, you either have to find a solution which is uniformly better than the current system on all the aspects I’ve talked about earlier, or you have to put an equal amount of work into marketing to convince people that some of the aspects are not important anymore.

All of this reminds me of the NoSQL movement. Classical relational database systems were the standard till a few years ago. Like peer review they address a very sophisticated set of requirements, and have been around for quite some time. However, it also became more and more apparent that they aren’t good for certain applications.

The main contribution of the NoSQL movement was to understand that some of the requirements could be weakened because they really weren’t that important for certain kinds of applications, and to see how that changed set of requirements could be used to produce systems which scale more easily.

What does this mean for the scientific publication system? I think to find an alternative process, we need to be fully aware of all the requirements the current system addresses, but we also need to question these requirements and be ready to fight hard to make people do the same. Because otherwise we’re stuck with finding a system which is better in all the aspects than the current system.

Note that this approach is also different from just focusing on one aspect and ignoring the rest as some of the approaches I’ve discussed above. It’s really something different to say “we’ve considered these requirements, but we think they aren’t important anymore” than “we just considered half of the problem for now.”

Rethinking why we have peer review

So the question is which parts of the problem can go? I think generally there is the tendency to believe that we don’t really need the publishers anymore. The Internet has made it very easy to publish something even in a permanent fashion, and most of the actual work has already been done by us anyway.

There is really no way around an efficient exchange of information and being able to find the information you look for. These are probably the core requirements.

Track records and objective measures of scientific output are of course important, but I think we might be able to find something new here eventually (and the current system also doesn’t really work well anyway. Daniel Lemire has a number of posts how papers as units of scientific work don’t make sense).

I think peer review is still very valuable, but its role probably needs to change. If we find more effective ways of filtering and measuring the impact, we no longer need peer review to be the first threshold to publication, and we no longer suffer from its errors or long turn-around times.

What can we do for now

So what can we do for now. Actually, I think you can do a lot. Don’t forget that we’re running this system ourselves. So whenever you are a reviewer, work hard to be an unbiased and fair reviewer. Never recommend to reject a paper just because you somehow missed the point and didn’t like the overall approach. NEVER reject a paper simply because it hasn’t compared itself against method X (there are thousands of methods out there), unless there is a very good reason to do so. NEVER reject a paper because you believe it is similar to method Y, unless you are very certain that they are very similar. In all the cases I got reviews like this, it never was true.

If you are an action editor or area chair, don’t accept bad reviews. If you organize a workshop, think about alternative ways to accept and review papers. Turn your blog into an informal journal, invite people to submit their work if they want to get the word out.

If you are in a position to discuss with decision makers in funding agencies, talk to them about alternative ways to measure scientific output. If you are in a committee to hire new faculty members, don’t just rely on impact factors to assess the scientific output of a member, but encourage the others to also look at the contributions of a candidate to the community besides peer reviewed publications.

And if you want to develop something new, always be aware of the full complexity of the problem, and be ready to explain why you neglect some of its aspects.

For further reading, Marcio von Muhlen has an interesting post called “We Need a Github of Science” which covers a lot of ground, and also tries to take into account the whole problem.

A last piece of advice: First get tenure or some other kind of permanent position, then work on improving the system. Always remember that others are publishing papers in the old system while you fantasize about a better world.

Posted by Mikio L. Braun at 2011-09-20 10:03:00 +0000

Short Review: Visualize This by Nathan Yau

One does not simply scale into real-time