Statically Typed

Why you think databases don't scale

In a few words: you’re doing it wrong!

With the announcement of Google’s App Engine, people have been buzzing about it and the conclusion that most people are coming to is that Bigtable is a huge limitation since it really doesn’t provide you with the tools that mySQL or similar engines do. In a sense, it is really limiting, but there’s a reason for it.

All of those joins you’re so fond of are exponential operations. On sites like reddit, people are fond of saying that the database engines just aren’t written to scale well. In reality, what you’re trying to query just doesn’t scale well. There’s a reason why Bigtable doesn’t allow them. With the amount of data that Google must operate on in a quick fashion (sometimes hundreds of terabytes in up to a trillion cells), it would just be impossible.

So, for Google to set up a system that will auto-scale your application, they can’t give you access to what simply doesn’t scale. From their perspective, they probably want Bigtable to be good at one thing: storing your data. Dealing with that data is the programming language’s domain. In fact, as the Digg people have said many times, the database is their bottleneck and they’ve been moving a lot of their data parsing into PHP.

What does this mean for you? Nothing. It’s highly unlikely that you will create anything the size of Google or even Digg. You can just happily use joins. In fact, you have a distinct advantage over the likes of Google because you can use these joins. You’re able to write software more easily because you don’t have to worry about hundreds of millions of users. Just remember that if you do hit their size, don’t blame the database. Refactor to get rid of those exponential operations.