Tuesday, January 04, 2011
Machine Learning: Beyond Prediction Accuracy
File under: seminar room
Okay, let’s start this year with something controversial. The other day I was meeting with some old friends I met at university and we were discussing about machine learning. They were a bit surprised when I disclosed that I became somewhat disillusioned with machine learning because I felt there was any resemblance of intelligence missing in even the most successful methods.
The point I tried to make was that the “mind” of machine learning methods is just like a calculator. If you take a method like the support vector machine (and in extension practically all kernel methods), you will see that the machine does in no way “understand” the data it is dealing with. Instead, millions of low-level features are cleverly combined linearly to achieve predictions which have remarkable accuracy.
On the contrary, if we human learn to predict something it seems (and I know that there is a risk with introspection when it comes to mental processes) that we also begin to understand the data, decompose it into parts and understand the relations between these parts. This allows us to reason about the parts, and also prepares us to transfer our knowledge to new tasks. Leaving prediction tasks aside, we can even just take in a whole set of data and understand its structure, and what it is.
I went on to explain that the same is even true of more complex, structured models like neural networks or graphical models. In neural networks, it is still unclear whether it really achieves some internal representation having semantic content (but see our NIPS paper where we try to understand better how representations change within a neural network). As for graphical models, the way I see it, they are very good at extracting a certain type of structure from the data, but this structure must be built into the structure of the network.
Actually, my complaint went further than this. Not only are even the most successful methods lacking any sign of real intelligence and understanding of the data, but the community as a whole also seems to be content to just keep following that part.
Almost every time I have tried to argue that we should think about methods which are able to autonomously “understand” the data they are working with, people say one of these two things:
Why would that get me better prediction accuracy?
The Airplane Analogy: Airplanes are also much different from birds with their stiff wings and engines. So why should learning machines be modeled after the human mind?
People seem to be very much focused on prediction accuracy. And while there are many applications where high prediction accuracy is exactly what you want to achieve, from a more scientific point of view, this represents an extremely reductionistic view of human intelligence and the human mind.
Concerning the second argument, people overlook the fact that while airplanes and birds are quite differently designed, they nevertheless are based on the same principles of aerodynamics. And as far as I see it, SVMs and the human mind are based on entirely different principles.
Learning Tasks Beyond Prediction Accuracy
At this points, my friends said something which made me think: If everyone is so focused on benchmarks, and benchmarks apparently do not require us to build machines which really understand the data, it is probably time to look for learning tasks which do.
I think this is a very good question. Part of the problem is of course that we have a very limited understanding of what the human mind actually does. Or put differently, being able to formally define what the problem is probably already more than half of the answer.
On the other hand, two problems come to mind which seem to be good candidates. Solutions exist, but they don’t really work well in all the cases, probably due to a lack of “real” intelligence: computer vision (mainly object recognition), and machine translation
If you look at current benchmark data sets in computer vision like the Pascal VOC challenges, you see that methods already achieve quite good predictions using a combination of biologically motivated features and “unintelligent” methods like SVMs. However, if you look closer, you also see that there are some categories which are inherently hard, and much more difficult to solve than other categories. If you look at the classification results, you see that you can detect airplanes with 88% average precision, but bottles only with 44%, or dining tables with 57%. I’d say that the reason is that you cannot just “learn” each possible representation of a dining table with respect to low-level features, but you need to develop a better “understanding” of the world as seen in two-dimensional images.
I don’t have similar numbers for machine translation, but taking Google translate as an example (at least I think it is safe to say that they are using the most extensive training set for their translations), then you see that results are often quite impressive, but only in the sense that you can still understand what the text is about by parsing the result, which is full of errors on every level, as a human.
These are two examples of problems which are potentially strong enough require truly “intelligent” algorithms. And I still have no idea what that might exactly mean, but it’s probably neither “symbolic” nor “connectionist” nor Bayesian, some mix of automatic feature generation and EM algorithm, probably.
Note that I didn’t include the Turing test. I think it is OK (and probably even preferable) if the task is defined in a formal way (and without refering back to humans for that matter).
In any case, I enjoyed talking to “outsiders” quite much. They are unaware of all the social implications of asking certain kinds of questions (What are the hot topics everyone is working on right now? Will this make the headlines at the next conference? What are senior researchers thinking about these matters), and every once in a while we should also step back and ask ourselves what the questions are we’re really interested in, independently of whether there is any support in the community for it or not.
Posted by Mikio L. Braun at 2011-01-04 17:15:00 +0100