What is Machine Learning

I'm often asked what it actually is what I do for a living. I'm not sure if you can understand this, but "we Germans" have a special relationships with anything which involves mathematics. There is a whole family of jokes about a mathematician who has a date and is asked what his job is. They all end with the disclosure "I'm a mathematician" followed by "Oh... . I was never good at math in school". And that is the end of it.

The next thing is that "Machine Learning" is somewhat close to "Artificial Intelligence" and, come on, who would be able to hear that somebody is working on building "intelligent machines" and keep a straight face. Have you ever had to call a company and realized that their replaced their already annoying menu by number scheme by an incredibly more annoying "speech recognition" software. "If you are calling to ask about your contract, say 'contract', if you want to ask about your invoice, say 'invoice'..." - "CON-TRACT!!!" - "Sorry, I could not understand you. If your are calling to ask..." and repeat ad infinitum.

Where was I? Ah, so the question is, what is Machine Learning about. In my Ph.D. thesis, I state that "machine learning is concerned which constructing algorithms which are able to learn from data". Well, this is certainly accurate but it does not answer a few important questions: Why would you want to learn from data? And what?

I lately have come to realize that machine learning is nothing else than an extension of how to write programs which solve complex problems. In fact problems which are so complex that you don't manage to come up with a formal specification of what the program should accomplish. Classically, programs have been written to address problems which could be formalized well: Basic arithmetics (but make it really fast), how to compute the shortest path in a graph, how to optimize network flow, and so on. But there were always these problems whose solution were elusive: making computers see, understand natural language, control robots which can interact with the real world. Most of these problems are "easily" solved by humans, but maybe only because evolution has outfitted us with the right hardware, er, wetware.

For most of these problems, it is easy to find partial solutions. For example, for object recognition in images, it is clear that the size or position of an object in an image does usually not make a difference (well, maybe unless the relative position of objects in an image does. Or since when can elephants fly?). But things quickly become less clear, and the "old way" of solving such problems by first understanding the problem fully and then devising a list of basic operations which always result in the right solution plainly does not work.

Enters Machine Learning! ML algorithms learn a mapping from input-output pair examples and state-of-the-art ML algorithms can deal with sets which contain up to several million examples. But wait, this does not mean that the "old way" is obsolete. As it turns out just taking a few million images and throwing some vanilla ML algorithm against the data won't work pretty well.

As almost every partitioner of ML knows, it is all in the preprocessing. In principle, the methods might be able to eventually work, but they work so much better (read: require less data) if you perform some form of preprocessing. For object recognition, it might help if the object is nicely centered, for example. ML people are often a bit annoyed that finding the right preprocessing is so important. They want to build machines which can do everything by themselves.

But if you look at it at a different angle, you see that the preprocessing is actually the place where the "old way" and the "ML way" nicely meet. The preprocessing roughly amounts to solving the problem partially, taking all available information into account. The remaining part of the problem is then handed over to the machine which solves it the way it works best: by brute force, quenching the required information from several thousand to millions of examples.

So, the next time somebody asks me what I do for a living, I'll try to tell them nothing about intelligence or statistics, but solving problems which are so hard that nobody has found a solution yet using the sheer computational power of modern processors. Let's see if I'll manage to circumvent the inevitable "I was bad at math" reaction. Or at least delay it for 5 minutes.

React to this post