Marginally Interesting: Machine Learning and Composability

One thing I find quite peculiar about machine learning research is that it is so much about constantly finding new algorithms for a small set of well known problems.

For example, if you consider supervised learning, there are the old statistically inspired linear methods, artificial neural networks, decision trees, support vector machines, Gaussian processes. The list goes on and on.

Of course, every of the above approaches has its own set of advantages and disadvantages, such that there is no clear winner. And every of those algorithms also stands for a certain school of thought, a certain meta-framework for constructing learning algorithms. John Langford has a nice list of such approaches on his blog.

The reason why this strikes me as peculiar is that in general in computer science, there is a much larger emphasis on finding new kinds of abstractions leading to reusable components on which others can build. Powerful new abstractions allow you to do more complex things with less code, and might ultimately even completely change the way you approach a certain problem.

Everything from numerical libraries, GUI frameworks, web frameworks, collection libraries, file systems, or concurrency libraries provide a certain abstraction for some funtionality, relieving the programmer of the burden to reinvent the wheel.

For some reason, this is not happening in machine learning, at least not to the extent it happens in general computer science. As a consequence, our overall power of dealing with complex problems has not increased significantly, and often, you find that you have to design a learning system more or less from scratch for new problems. So instead of taking several components and compose a new learner, you might have to start with modifying an existing cost function, and implement a specialized optimizer, or you have to start out with a certain graphical model and come up with appropriate approximations to make learning tractable.

However, there might be a reason why machine learning research simply isn’t that composable as normal computer science:

ML is actually a well-defined subset of computer science, such that there really only are a few number of problems to solve within the domain of ML. The same is probably true of other fields like optimization or solving differential equations.
It might be hard to design learning algorithms in terms of well-defined, loosely coupled components because they deal with inherently noisy data. Controlling the propagation of error might be difficult, such that it is hard to find good abstractions which are widely applicable independently of their
ML is also a lot about inventing new approaches to designing learning algorithms. Inspirations come from a number of places, like statistics, physics, or biology. These are also abstractions, but not on a level of a piece of code, more on a meta-level.
ML is quite complex and the right abstractions haven’t been found yet.

There also exist some examples of machine learning methods which build on other algorithms:

There exist approaches to feature selection which treat an algorithm as a black box and evaluate the learner on subsets to find informative features.
On a very basic level, a procedure like cross-validation builds on the abstraction of a learning algorithm.
There exists a method for estimating the number of clusters based on stability of solutions by myself and others which also takes a learner as a subroutine.
SimpleMKL, a method for multiple kernel learning, has a normal SVM solver as an inner loop.

In summary, I think that finding new abstractions is important because it gives you more power to build complex things while keeping the mental complexity on a same level. And to me it seems that the potential is not fully explored in machine learning.

Posted by at 2010-03-25 12:17:00 +0000

What I have learned from Twimpact

Command Line Interactive Machine Learning on the JVM. Part 1: Why?