Tuesday, May 21, 2013

A Trip to Silicon Valley: First impressions

At the end of April, Leo and I went on a one week trip to the Valley. Over the years, we had built up a number of connections in the Valley and we thought that now was the time to go over there and meet face to face. We end up having 20 meetings over 6 days, which made for quite a schedule.

It’s one thing to know abstractly that the Silicon Valley is home to most computer related companies, and to drive down Highway 101 and see another well known company every 30 seconds or so. “Oh look, there’s Evernote”—“There’s Intel”—“I think I just saw Salesforce”, and so on.

The situation gets even more intense once you get to San Francisco. Particularly SoMa, an area probably 2 times 4 kilometers wide, seems to host every Internet startup you have ever heard of, and some bigger ones, too. Twitter, Trulia, Flurry, Dropbox, Zendesk, etc. are all in that area. It’s as if the whole Internet industry has their offices in Berlin Mitte in the area between Unter den Linden and Leipziger Straße.

We spent a lot of time in coffee shops for free Wifi, especially in the Paris Baguette on University Avenue in Palo Alto, a Korean coffee shop which reminded Leo of his time in Seoul. To get Wifi access, you had to check into the Paris Baguette on Facebook, something I found pretty neat. As it turned out, this wasn’t a new general feature of Facebook, but something being test-driven in a few coffee shops in the Valley first. We found a few more such examples, like being able to pay with bitcoins in the Coupa Cafe.

It sometimes felt as if the whole Valley was turned into one big sandbox to try out new business ideas and new pieces of technology. Already on our first day, when we sleepily sat in the sun trying to shake off our jetlag, we noticed that everyone seemed to be an entrepreneur. People were discussing business models, hacking on their websites, pitching, wherever we went. Later, people would complain that it’s so hard to hire anyone because everyone wants to be a founder.

As we started to talk to people, we also noticed people being quite open and supportive. It’s probably our German bias, but when you talk to people in Germany about your business, they quickly get defensive and start to question the merit of your whole approach. “Hasn’t that been done before” or “I think I still don’t understand what’s so great about that” are the kinder things friends of you would say. In contrast, people in the Valley seemed much more open as if there’s a general understanding that it doesn’t hurt to try. Even if people weren’t impressed by your approach they’d offer some piece of advice. It was also very common that people offered to connect you to other people which might be interested.

Originally, we had no meetings scheduled for Friday, the day when we were flying back to Germany in the evening, but in the end we had three meetings more or less back to back just because of these introduction. It felt as if we could have stayed for another week without getting bored. People later told us that they know of people who came over for three months and still could have gone on.

As someone said: The funny thing about the Valley is that although it’s all about the Internet and being connected online, actually meeting face to face counts so much.

Posted by at Tue May 21 15:41:00 +0200 2013.

Friday, April 05, 2013

Reclaim your data, own a piece of the cloud!

File under: Coffee Talk

Lately I’ve been discussing quite a bit with Leo about the current state of the ‘Net. Sure it’s nice to get all those services in the cloud for free, but in the end, you either have to worry about what exactly happens with your data, or what you can do against companies shutting down cloud based services like the Google Reader, leaving you with a pile of useless XML files, a bit like letting you take home the remnants of your car after compactification.

I used to say that the main problem is that the user is the product, not the customer, but this post by Derek Powazek convinced me that even paying for the service won’t ensure that you get decent support and control over your data.

In the end, the only thing that helps is to reclaim your data and the service itself. Just like your wordpress powered blog on your own root server will stay around as long as you pay the bills, both data and the software to make it come alive should run on a computer you control.

But how have ended up in a situation like this anyway? Here is my little history of networked computing.

In the mainframe era, computers where huge bulky machines, and only large institutions could afford to have one. People invented time-sharing operating systems to make those computers usable to many people concurrently, which were usually connected through dumb text-terminals. Those terminals mostly worked in a block-oriented manner, meaning that they presented you with a form which you would submit to the server to get the results, a bit like a form on a web page.

Obviously, services where hosted centrally, and you very much depended on the mainframe for storage and providing the service.

All this changed with the advent of the Internet and the home computer. Instead of a relatively small number of large mainframe computers you got a large network of small machines. Services like mail, ftp, and even http were designed in a way that they could run in a decentralized manner. In principle, anyone could hook up a computer to the network and run the services he was interested on his server.

Of course, you had to solve a number of technical problems, getting good bandwidth to your home was a problem, you had to use a dynamic DNS service to map changing dial-up IP addresses to a DNS entry, you had to know Linux or some other variant of UNIX, but it was possible.

Server virtualization made things a lot easier. People realized that most of the times, computers were sitting idle anyway, so why not combine them virtually in a server. That also made it possible to host a large number of servers in data centers, where they also had constant internet access, for relatively small amounts of money. (BTW, virtualization already existed in the mainframe era.)

Server virtualization and the resulting technology of putting lots of PC-type servers into racks (which look a lot like the mainframes of old from the outside) allowed companies to create massive server farms for their data intensive services.

It probably all started with Google search and Amazon. Google, because they needed to store an index of the whole web somewhere, Amazon, because millions of people wanted to use the website each day.

Lead by this example, other companies followed, and nowadays it’s entirely normal to rent out thousands of servers in the cloud (virtual or otherwise) and build services on that private armada of computers.

I’m not the first to point out that this is really just the same setup like the mainframe era, only with different technological means. While your computer is in principle able to store enormous amounts of data, and it can provide the same services as the machines in the cloud, it’s reduced to a screen to run some web browser.

So we went full circle from centralization to decentralization and back, gaining and losing control over our data and the services we need.

But there is a way out. Now it’s easier than ever to rent a piece of the cloud. We already spend enough dollars per month on our smartphones, and probably also for some cloud based services like cloud storage, why not spend a bit more money and also own a small machine somewhere in the cloud?

If that seems odd to you, have you ever noticed that a smartphone is already a bit like a small server in the cloud? It runs Linux (well at least some do), is always connected, comes typically with a few GB of local storage. In principle, you could install some dynamic DNS program on it to become a full Internet server.

Of course, managing virtual machines is still much too technical for the ordinary person. We would also need a new type of cloud based service which would keep only data which needs to be global in the company’s server farm while offloading user specific data to the user’s servers.

But technically, it’s all possible. And wouldn’t it be cool? ;)

Posted by Mikio L. Braun at Fri Apr 05 16:45:00 +0200 2013.

Wednesday, March 20, 2013

Misconceptions about the CAP Theorem

File under: Machine Room

If you’ve ever listened to a NoSQL talk, you’ve probably come across the CAP theorem. The argument usually goes like this:

  • Traditional databases guarantee consistency.
  • The CAP theorem tells you that you cannot have consistency, availability, and fault-tolerance at the same time.
  • But we want to build scalable databases, so we forget about consistency.
  • Oh and by the way, who needs consistency anyway?

To be honest, to me this always looked like some poor excuse to not really discuss the design decisions of some NoSQL database. It’s probably just me, but I much prefer at least an attempt at an unbiased analysis of the pros and cons so that I can make an informed decision whether it fits my needs or not. But pulling this theorem out of the hat is like saying “we don’t even need to discuss this, because this theorem says impossible, ok!”

While searching for discussions of the CAP theorem, I found this excellent (but lengthy) article by Eric Brewer, one of the original authors of the CAP theorem: CAP Twelve Years Later: How the “Rules” Have Changed.

Here is my summary:

First of all, the interpretation that the CAP theorem says “you can only have 2 out of 3” is misleading. It’s not like the original proof discussed all possible choices and showed that you can have only 2 out of 3.

Instead, the original proof discusses the following situation: Say you have a distributed system which is in a consistent state (whatever that means), and now there is a Partition of the system, either an actual network failure, or some other way in which machines cannot talk to one another anymore.

Now consider what options you have when there is write request. You could wait for the partition to end in order to make sure that your system stays in a consistent state (thereby sacrificing Availability), or you could do the update partially (thereby sacrificing Consistency). So you can have only C or A in case of P but not both.

Note that there is really no way in which you could “choose P”, it was always about how to handle partitions (which are often not really partitions, but timeouts), and that includes how to detect partitions, how to behave when you are in a “partition state”, and how to bring the system back to a consistent state after a partition.

The article stresses that these are no binary decisions, but that there is rather a whole spectrum of possibly actions and strategies to choose from. It’s not about saying “I can’t have consistency and availability, so I’ll just forget about consistency”, it’s about saying “in case of a failure, availability is more important to me, therefore I will accept temporary inconsistencies, and implement strategies to clean up afterwards”.

When you look at it that way, you get a much clearer picture of how a database like Cassandra fits into this, and how their read repair, hinted handoff features work to regain consistency, although in a very lax (and eventual) way.

But it also becomes clear that it’s just not true that you cannot have distributed databases which are highly available and come with consistency guarantees at all. The article goes on to discuss recent research results which try to achieve exactly that, strategies to minimize the impact of a partition on availability and consistency, how to re-establish consistency after a partition (also in the broader database sense of having consistent cross-references between tables and satisfying other invariants)

So the next time someone tells you he doesn’t care about consistency because of the CAP theorem, ask him how he chooses P, and how he deals with the detection, handling, and cleanup of partitions.

Posted by Mikio L. Braun at Wed Mar 20 21:35:00 +0100 2013.

older posts