Curation and Collaboration in Science

While I have blogged about these topics before, I've tried to stay clear of them for some time, mostly because I've come believe that while the peer review system is broken, there is little that can be done, and what can be done would take up too much of my time.

Then I got involved with a few conference reviews like NIPS, or discussions within the JMLR MLOSS editoral board, and it got me thinking again.

As you might know I'm an action editor for the machine learning open source software track for some time now. Lately, I've begun to find our function there a bit odd. We lately discussed our policy for handling discussions relying on proprietary, closed-source software. I don't want to discuss the pros and cons here, you can probably read about that elsewhere.

But the point is, whatever decision we made, we would control the visibility and standing of an open source project in the machine learning community. Accepting a project will give the "JMLR seal of approval" to the project and make it much more known within the community, hopefully leading to wider usage and more interaction with users and possibly collaborators.

But is that really something we can or should decide on? Isn't the whole idea of open source that you put your project out somewhere and invite people to collaborate? Is it really our job as JMLR MLOSS action editors to decide which projects are worth collaborating and which aren't? We wish to give a boost to projects we find good, but would we prohibit projects from getting known?

Of course, I'm exaggerating a lot here. Even if they're not accepted at JMLR, they can build a nice website, get engaged with users, grow if there is an interest in what they're doing. In a way, we're just curators, trying to pick out projects which we find worthwhile and use the reach of JMLR to make them better known (and giving an entity which you can cite, but that is a different story).

But what about scientific papers? They form the basis of open, distributed collaboration. Once published, others can build open the results, incorporate them in their own papers, or improve upon them, and so on. This collaboration transcends social, geographical, and even temporal ties. You can extend the work of people you don't know, who are much higher in the hierarchy and probably wouldn't talk to you when you met them in person, or who don't even live anymore, or live on the other side of the planet. I personally think, this way of collaboration is one of the pillars of how science operates.

But how well does it work in practice, today? The difference to the JMLR MLOSS case above is that you not only give a boost to papers you accept, but you also prohibit rejected papers from entering the open collaboration process. Not only does your work not get known to a wider audience, it also lacks the seal of approval of a peer reviewed conference or journal. There exist preprint servers like arxiv, but still, many rejected papers either end up discarded or are resubmitted to another conference after being significantly improved (but this work happens in a non-distributed closed fashion).

When people discuss peer review they mostly talk about its role to filter out work which is below the threshold, and to reduce the amount of work to something you could in principle still keep track of. Those are the roles of a curator, which is an important role. But people forget that given the way it works right now, the curators also have a huge influence on who collaborates with whom on what and that just doesn't seem right.

The end effect is that the scientific community just doesn't reach that level of "parallelism" which would be desired. Instead of more or less open collaboration also on fresh and novel approaches beyond social ties, people are forced to work on new ideas in an more or less isolated fashion, and do the marketing work until their results become sufficiently "mainstream" to be accepted at the major conferences and journals in their field.

What we need is a better separation between processes with curate, and those which support open collaboration. We need both, venues which try to take a relevant snapshot of what is worked on right now to make it more widely accessible to the whole community, but also ways to support the way of open collaboration science is built upon.

There are attempts at this, like workshops which are focussed on a specific area and relatively open such that everyone can at least get a poster. There are "negative results" kind of workshops aimed at making those results known which are hard to publish. But these things don't have the same standing as a "real" publication because we let our work still be measured in terms of success at curation type venues, which shifts the focus in unhealthy ways away from doing actual proper scientific work.

React to this post