MARGINALLY INTERESTING


MACHINE LEARNING, COMPUTER SCIENCE, JAZZ, AND ALL THAT

Short Review of Edward R. Tufte's "The Visual Display of Quantitative Information"

On the bottom line, I found the book quite interesting to read, although you probably would have managed to fit the material into three to five blog posts (yes, that’s how we measure document lengths today). The book spends about a third of the time reviewing the history of statistical plots. While it might be quite fascinating that William Playfair produced pretty modern looking plots already in the 18th century, I’m not sure that this is the best way to approach such a field.

Another third is spent on common mistakes and lies in statistics plots, including all kinds of exaggerations, visual noise (cross hatching, moire effects and friends), and downright stupid plots, for example “executive summary”-style plots consisting of only three bars (one of which is the sum of the other two).

The strongest part of the book IMHO was the third part which develops a number of design principles. Tufte’s main points are to use as much “ink” (read “toner” or “pixels”) as possible to show data, reducing gridlines and axes as much as possible. Tufte is also an advocate for data rich plots, arguing that our visual system is quite capable of dealing with high information densities.

Like most machine learners, I’ve done most of my plots with MATLAB and more recently matplotlib, and I’m sort of used to the style their provide. Tufte’s approach is somewhat different, and more clean, which is a nice change. JavaScript plotting libraries like protovis or D3.js follow the aesthetics of Tufte more.

What I particularily liked about his approach was the idea that visualizations can really help to understand data using our visual system. As he says, “Above all, show data”, meaning that you shouldn’t hesitate to put as much data as possible before your eyes (within reason) so that you can really start exploring the structure in your data visually.

Cross-post: Hey, Google+ is not world peace

Sorry for the lack of posts lately. Somehow I’ve become a bit spread out too thin across tumblr, the TWIMPACT Dev Blog, and now of course Google+.

I originally set up this blog to be the ideal place for everything with jsMath to render some latex, markdown for the editing, and disqus for comments, but sometimes I find it easier to post something on tumblr. No idea whether this will eventually converge on some platform.

In any case, I posted the following originally on the TWIMPACT Dev Blog:

Some people start behaving like Google+ is the second coming of Christ, a cure against cancer and world peace all rolled into one. Mike Elgan has gone on a “Google+ diet” and is redirecting all his communication (including email!) to Google+. Other bloggers (e.g. Kevin Rose) have shut down their blog completely, redirecting their site to their Google+ profile. Others state that everything else has started to become boring once your exposed to Google+.

Now admittedly, Google+ is very nice and it’s definitely a big step forward, but there is still a lot that needs to be done. I’m not dismissing it in any way. It’s has a lot of potential, but as every complex system out there, there are a lot of details which need further attention.

So here is the list:

  • RSS feeds. A lot of people are still using RSS readers for blogs. Google+ doesn’t have this feature yet. Switching your blog to Google+ currently means that all those people won’t get your updates anymore. Doesn’t sound like a nice move to me. The closest thing you can get right now is an (unofficial) hack at http://plusfeed.appspot.com/

  • Some form of bookmarking. Just as in Twitter, it is currently very hard to find interesting stuff in your stream again. +1 would be a nice way to bookmark posts, but currently they don’t show up under the +1 tab.

  • Private messages. Sometimes you want to have a small private exchange. You can always go back to email, but that would be quite disruptive. I know that you can have a privat conversation if you share a post to just one person, but that not very obvious. What you need is a button or a menu entry next to the person’s icon or on their profile page.

  • Real-time search. Now that has me really baffled. On all other Google products (email, calendar, docs, etc.), search is an integral part of the experience. In fact, putting full text search on Gmail was one of the game changers back when it came out. Still, no real-time search on Google+, neither for all public posts nor on your private streams. IMHO, this is probably the most important feature that is missing right now.

  • Public API and third party integration. To cross post your stuff to Facebook and make everyone there crazy.

Resist Holy Tech Wars

Only a Sith deals in absolutes.
Obi-Wan Kenobi, Star Wars Episode III: Revenge of the Sith

It might probably be due to my age (recently turned thirty-six), but I’m recently observing a certain aversion in me against tech holy wars. It’s something between “C’mon, you can’t be serious” and “Not again”. I know that they have a decades old tradition in computer science, but still, I find them somewhat irritating.

  • Discussion about programming languages are full of these. For example, you have the dynamic vs. static typing discussion, functional vs. object-oriented, what is the right abstraction for concurrency: locks, actors, STM, or something entirely different, immutable vs. mutable data structures, etc. Of course, most of this discussion is led in a reasonably rational tone, but every now and then, you meet people who categorically reject anything which isn’t X.

  • In database systems, there is the NoSQL vs. SQL databases, which break down to consistency vs. being eventually consistent, “scaling up” vs. “scaling out”, and so on.

  • In Machine Learning we have the Frequentism vs. Bayesians divide. I actually often forget that people are taking this serious, but then again I end up with people rejecting ideas because they are from the other group.

Note that most of these questions are pretty big ones: Programming languages, databases, notions of probability and inference under uncertainty. There are little holy wars on smaller things.

I had an interesting exchange on Twitter today with David MacIver on this, and we agreed that in many cases, it is clear that there are arguments in favor of each side depending on the context. Different programming languages are fit for different things, and there is no language or programming paradigm which is universally superior. At the same time, the costs for mastering both alternatives are often quite large. You tend to grow attached to the programming paradigm you do most of your work in, and also to the tools, editors, libraries, the community, etc., and might over time become relucant to switch.

David added that in many cases, however, it’s actually not that difficult to do both but people suffer from sunk cost fallacies. What this means is that people are somewhat averse to writing off investments they have already made. Basically, if you spent a lot of time learning a certain technology, you would feel like you’ve wasted all that time if you switched to a different technology.

Of course, and this is the reason why it’s called a fallacy, this has usually nothing to do with which technology is actually better for the problem at hand.

As I said, holy wars are often about complex, high-level stuff. And unfortunately, things are never that easy. It’s usually quite complicated. Both sides have areas where they shine, and others where they fail. At the same time, problems usually also have rather complex requirements which seldom align with the available solutions.

I think that for many, holy wars are also rooted in a wish to simplify life, to get easy answers. “Which programming language should I learn?” For which problem? Number crunching? Distributed programming? Building a web site? “Which database technology is the best one?” Which will scale with your demands for the next ten years? There are no simple answers. But that’s life, basically.

I think what irritates me most about this is when smart people, who otherwise seem to be able to take in a huge amount of detail, give in to holy wars. In particular when they’re scientists, because our professionality demands that we’re open to new things, and always suspicious of what we think we already know.

To close this rant, and to counter the argument that I’m basically having a holy war against holy wars, I admit that holy wars (at least in tech) have their merit. Often, it forces both sides to focus on what’s special, and possibly also grow in the process. Given the huge number of technologies to learn today, it’s also a good thing to just focus on one thing for some time, and nothing helps like believing that you’ve found the greatest thing there is. At the end of the day, however, you have to put it all into perspective and admit that perfect solutions only exist in fairy tales.