Thursday, July 29, 2010

Multi-Tenancy and Virtualization

In a recent blog post on Gigaom, Simeon Simeonov argues that virtualization is on the way out, and discusses VMware's move toward platform-as-a-service computing. In a nutshell, his argument is that virtualization is inefficient, and is essentially a last resort when legacy applications can't play nicely together in the same sandbox. In other words, the real goal for IT shops and service providers is not virtualization per se, but multi-tenancy, cost-effective use of hardware, and high availability. Find any two servers in the average corporate data center and ask why they're not running on the same machine. It's a good bet you'll get one of the following four answers: (1) machine A is running a piece of software that misbehaves if run on the same machine as some piece of software running on machine B, (2) a single server couldn't handle the load, (3) one of those servers provides redundancy for the other, or (4) no particular reason, but we haven't gotten around to consolidating them yet. In my experience, the first answer is probably the most common. But as Simeonov points out, the ideal solution is not virtualization, but better software - specifically, platforms that can transparently service multiple customers.

PostgreSQL is very strong in this area. Hosting providers such as hub.org provision databases for multiple customers onto a single PostgreSQL instance; and here at EnterpriseDB, we support several customers who do much the same thing. Databases in PostgreSQL provide a high degree of isolation: many configuration parameters can be set on a per-database basis, extensions can be installed into a single database without affecting other databases that are part of the same instance, and each database can in turn contain multiple schemas. The ability to have multiple databases, each containing multiple schemas, makes the PostgreSQL model more flexible than Oracle or MySQL, which have only a single tier system. In the upcoming PostgreSQL 9.0 release, the new grant on all in schema and default privileges features will further simplify user administration in multi-user and multi-tenant environments. Behind the scenes, a PostgreSQL instance uses a single buffer pool which can be efficiently shared among any number of databases without excessive lock contention. This is critical. Fragmenting memory into many small buffer pools prevents databases from scaling up (using more memory) when under heavy load, and at the same time prevents databases from scaling down (using less memory) when not in use. By managing all databases out of a single pool, PostgreSQL can allow a single database to use every block in the buffer pool - if no other databases are in use - or no blocks at all - if the database is completely idle.

Simeonov seems to feel that virtualization has already nearly run its course, and predicts that the market will hit its peak within three years. That doesn't seem likely to me. I think there is an awful lot of crufty hardware and software out there that could benefit from virtualization, but it's working right now, so no one is eager to make changes that might break something. As the physical equipment starts to fail, IT administrators will think about virtualization, but hardware that isn't touched can sometimes run for a surprisingly long time, so I don't expect server consolidation projects to disappear any time soon. More importantly, Simeonov seems to assume that all new applications will be developed using platform-as-a-service architectures such as Google App Engine, Bungee, Engine Yard, and Heroku. While some certainly will be, it seems unlikely that the traditional model of application development, using a dedicated web server and a dedicated database running on a physical or virtual machine will disappear overnight. For one thing, choosing one of those vendors means being locked into that vendor's API - and choice of programming language. Bungee and Heroku are Ruby environments, for example, while Google App Engine offers Java and Python. Good luck making the switch!

So, if plain old virtual machines are going to be around for a while, how does PostgreSQL stack up in that environment? Not too bad. Of course, write-intensive workloads will suffer from the generalized slowness of virtualized I/O. But PostgreSQL is designed to run well even in a very small memory footprint, to take good advantage of the OS buffer cache and process scheduler, and to be portable across a wide variety of platforms. If your database is small enough to fit in memory, performance should be good. And if your database isn't small enough to fit in memory, there's not much point in virtualizing it: you're going to need a dedicated machine either way.

2 comments:

  1. Even though there are some benefits to virtualizations in some situations, I always thought they were not compelling enough for IT shops to adopt the technology on-premise for the same reasons mentioned in this article. As you mentioned, VMWare realizes the trends in the virtualizations marketplace and embracing Platform as a service model. Thanks for the great article.

    ReplyDelete
  2. This is an interesting point of view. May be the importance of the visualisation depends on your goals. practicality should be put on the first place if you seek for efficiency.

    ReplyDelete