I think there is more to be learned from the open source software development process than just publishing the code from your papers. So far, we’ve mostly focused on making the software side more similar to publishing scientific papers, for example, through creating a special open source software track at JMLR.
However, there is more to be learned from the open source software development process:
Contrast this with the typical publication process in science where there lie months between your first idea, the submission of the paper, its publication, and the reactions through follow-up and response papers.
Again, contrast this with how you usually work in science, where it’s much more common to collaborate with people from your group or people within the same project only. Even if there were someone working on something which would be immensely useful for you, you wouldn’t know till months later when their work is finally published. The effect is that there is lots of duplicate work, research results from different groups don’t usually interact easily, and much potential for collaboration and synergy is wasted.
While there are certainly reasons while these two areas are different, I think there are ways to make research more interactive and open. And while probably most people aren’t willing to switch to open notebook science, I think there are a few things which you can try out now:
Communicate to people through your blog, or by Twitter or Facebook, and let them know what you’re working on, even before you have polished and published it. And if you don’t feel comfortable to disclose everything, how about some preliminary plots or performance numbers? To see how others are using social networks to communicate about their research, have a look at the machine learning twibe, or my (entirely non-authoritative) list of machine learning twitterers, or lists of machine learning people others have compiled, or another list of machine learning related blogs.
Release your software as early as possible, and make use of available infrastructure like blogs, mailing lists, issue trackers, or wikis. There are almost infinitely many options to go about this, either using some site like github, sourceforge, kenai, launchpad, savannah, or by setting up a private repository, for example using trac, or just a bare subversion repository. It doesn’t have to be that complicated, though. You can even just put a git repository on your static homepage and have people pull from there. And of course, register your project with mloss, such that others can find it and stay up to date on releases.
Turn your research project into a software project to create something others can readily reuse. This means making your software usable for others, interface it with existing software, and also, start reusing existing software as well. It doesn’t have to be large if it’s useful. Have a look at mloss for a huge list of already existing machine learning related software projects.
Posted by Mikio L. Braun at 2010-01-13 10:55:00 +0100blog comments powered by Disqus