Most people who decide to do a Ph.D. are well aware that it will mean a lot of work. You have to learn a lot of new stuff, possibly also outside of the topics you have studied so far. Taking machine learning as an example, you probably need to learn much more math than you’ve already been exposed to, including a mix of linear algebra, optimization theory, probability theory, statistics, and so on. But you also need to learn something about the area where you apply your methods, for example, bioinformatics, linguistics, and so on.
But at the same time, doing a Ph.D. also poses some psychological challenges and from my experience I can say that many students are quite surprised by the level of problems they face. In contrast to a Bachelor or a Master, which requires you to learn some topic and be able to apply what you’ve learned to new similar problems, doing a Ph.D. means doing something which hasn’t been done before. You need to solve a problem which hasn’t been solved before.
Now this may sound not that surprising because that’s what research is all about: exploring questions, solving problems, advancing the state-of-the-art. But you only realize what this really means when you’re one or two years into your graduate studies, you have learned quite a lot and come to understand the nature of the problem, and you realize that you have no idea how to solve the problem.
There is of course a lot you can do to hedge the risk of failing. For example, you can start with simpler subproblems and work yourself up towards the full problem. You can work on a number of smaller problems such that you build up a collection of work done. But at some point you will invariable find yourself in a situation when you have to admit that you really cannot know whether you’ll be able to solve the problem, or whether any of your usual strategies will help.
And this doesn’t even include the social aspects of doing a Ph.D., of getting published, getting cited, building up some form of reputation in the community.
I found myself in exactly this situation towards the end of my studies. I had to switch topics inbetween because the original idea didn’t quite turn out as expected. I wrote my thesis about convergence of eigenvalues and eigenvectors of the kernel matrix. But till the very end, a central proof was missing. I had run extensive numerical simulations so I was quite sure about what I wanted to prove, but only in the very end I managed to put the proof together. So here I was, with a few month left before my position ended, trying to solve that problem every day but not knowing whether I would be able to do that in the end or not. To illustrate my state of mind, when I moved to a different town, I couldn’t rent the truck of the size I had reserved but only one which was about a meter shorter. All my friends told me “Mikio, forget it, we’ll never get all your stuff in there”, but I was just like “ah, impossible, well, yes… .” In the end, everything except for one cupboard went in which was ok, and showed that we both had been wrong.
Actually, I have come to believe that this experience is part of what it means to do a Ph.D.. Eventually, you will succeed in one way or another, and you will have learned a very valuable lesson. You will see how the problem slowly sinks into your mind until your understanding of the problem will lead you to a solution, or uncover that it is not possible, but you will also have understood why.
In the end, doing a Ph.D. is exactly about this: Learning to do what no one has done before and be confident even when there is only a limited amount of time and you have no idea whether you will be able to solve the problem. And that is an important part of what science is about.
Posted by Mikio L. Braun at Tue Jan 24 10:23:00 +0100 2012.
These are the slides to our talk (joint work with Tammo Krüger and Danny Panknin) at the BigLearning workshop at NIPS 2011. You can also have a look at the paper and the appendix.
So what is it about? In a nutshell, we try to speed up cross-validation by starting with subsamples of the data and identifying quickly parameter configurations which are clearly suboptimal. Learning on subsets is of course much faster so ideally you’ll save a lot of time because you will only have a handfull of parameter candidates left on the full data set.
The method is based on the sequential analysis framework which deals with the problem of statistical hypothesis testing when the sample size isn’t fixed.
The main problem one faces is that the performance of parameter configurations changes significantly as the sample size increases. For a fixed parameter configuration (say a kernel width and a regularization parameter for an SVM), it is clear that the error converges, and usually becomes smaller as the number of samples increases. However, if one compares two configurations, one can often observe that one configuration is better for small sample sizes, while the other becomes better later on. This phenomenon is linked to the complexity of the model associated with a parameter choice. General speaking, more complex models require more data to fit correctly and will overfit on too few data points.
Our method accounts for this effect by adjusting the statistical tests to maximize the number of failures before a configuration is removed from the set of active configurations. Nevertheless, fast cross-validation is faster by a factor of 50-100 on our benchmark data sets.
Posted by Mikio L. Braun at Tue Dec 20 14:58:00 +0100 2011.
Apparently, the discussions about “Scala being too complex” are heating up, mostly due to a leaked email from one of Yammer’s programmers to the Scala people where he discusses some of his experiences he’s had with using Scala in a production environment, and the other being a post on HN comparing Scala to Perl in the sense that both languages have too much flexibility in solving a specific task leading to a mix of different programming paradigms and styles which will make you code harder to read and maintain.
Now we’ve been using Scala as our main programming language for the last two and half years for TWIMPACT, so I know what people are talking about. And the truth is, it is all true, sadly. On the one hand, Scala is a pretty awesome programming language which is very nicely designed. I’ve said this before, but normally you will eventually come across some feature of a programming language which is not designed well and you have to code your way around it, but I’ve yet to come about something like it in Scala.
On the other hand, it is also true that some of the libraries are not as fast as they should be. Although I like the idea of immutable collections a lot, every time I need performance, I’d rather put in a Java collection. Also, it’s true that the collection library is pretty complex. It all kind of makes sense to get a clean design of the classes, but it’s pretty complicated with all those classes like Seq, SeqLike, Traversable, TraversableOnce, etc. However, you’ll probably only need to know all the details if you want to write your own collections which integrate seamlessly with the existing collection classes.
It’s also true that upgrading to a new version is hard. For some reason, many libraries seem to be quite deeply interlocked with the Scala version. While our own code never had to be changed if Scala went to a new version, this wasn’t true for most libraries, unfortunately, meaning that you have to wait till all the libraries have been upgraded to the new version before you can do the update yourself. And frankly, I don’t see why this is necessary.
We’ve never bothered with sbt, but directly went for maven due to it’s better integration in most IDE’s. We’re using IntelliJ IDEA whose Scala plugin has come a long way and gives pretty good support. There is also a lot to be improved in the basic tools like the compiler or the shell in terms of startup time. Scala seems to preload several megabytes of jar files on startup, probably in an attempt at optimization, but in the end, it only means that starting Scala takes anywhere between 5 - 10 seconds which is really a lot if you’re working on the shell (and every other language starts up almost immediately) The guys behind JRuby have invested a lot of time to cut down on the startup time, and that was time well spent.
People are also often attacking Scala for it’s complexity. While it’s certainly true that it’s easier to hire some Java expert than someone who knows Scala, IMHO Scala is a big improvement in many ways over Java, which feels overly verbose once you’ve learned Scala. As with every language, there are more basic concepts and more advanced concepts and usually, you don’t have to master them all from the start. Also, people often argue as if the complexity about learning a programming language is all in the programming language, but you also have to consider the standard libraries and tools. For example, while the Java programming language is relatively simple in terms of concepts, the standard tools and frameworks are pretty intimidating to learn (all that XML, Maven, Spring, etc.)
Then people are also complaining about the community, which is supposedly not helpful enough, or too fragmented, or only consists of crazy people who are just thinking about how to implement everything in terms of category theory. I don’t think that is true. Scala is still young, and the community can still grow. We’ve uncovered a number of bugs (mostly Leo who has a knack for finding bugs in libraries) and people were mostly as responsive as you’d expect them to be. One of the strengths of Scala is also that it is quite painless to reuse existing Java projects (as any other programming language for the JVM). I never found it that repulsive as some seem to use a Java library from Scala. The integration is quite painless, and if you really have to, you can add a bit of syntactic sugar on your side for the stuff you need most.
Finally, I really don’t get the argument of people who are saying “Scala is too complex, I switched to Python (or some other scripting language)”. To me, these are completely different sets of programming languages. While it’s true that there are some applications like writing medium sized web sites which you can nowadays do in either a scripting language or a compiled language, there are many applications where Python (or any other scripting language) just can’t compete. In scripting languages, it’s hard to add primitive data types which are really fast unless someone else already took care of implementing the most computing-intensive routines in C.
So in summary, Scala is both awesome and awful, just like almost every piece of sufficiently advanced technology. You can work with Scala, and it’s a lot of fun, or you can reject it for a number of reasons, just acknowledge the complexity and don’t give in to hypes and marketing.