« Welcome to the Sleevenotez blog | Main | Sleevenotez... on Amazon's Elastic Compute Cloud? »

Sleevenotes Architecture

Hi I’m Doug, and I’m the software architect on this project (and one of the coders too). Most of my posts are going to be painfully technical, but I’ll try to keep this one at least a bit readable. We’re using some pretty left-field technology on this project and the purpose of this post is to introduce a few of the architectural aspects of the application, and how these led to selecting this software.

These left-field components are Twisted, Nevow & Axiom. All of these are built on the no-longer left-field Python.

What we are building is, in essence, a massive Mash Up. The majority of the data we display is going to be fetched from elsewhere, processed and passed on to the user. Each page view by a user might lead to a dozen or so queries going out to service providers, and then incremental asynchronous updating of the user’s display.

What this means is that the part of our application that needs the most thought is the part where it is doing nothing: when it’s waiting. We’re going to spend an awful lot of time waiting. The aim is to do all that waiting for as low a cost as possible.

Traditional architectures don’t handle waiting very well. A normal web application might have a server with 2-10 general purpose processing threads. Each thread will have some associated in memory caches, it’ll probably hold a database connection or two, it’ll have some thread local storage to handle context and all sorts of other stuff. These threads are expensive.

In an application like that, you do all your waiting on your own time. You connect to Amazon, for example, and then you sit there, blocking, until Amazon returns. We’re going to be issuing hundreds of these sorts of requests a second, possibly. They aren’t inherently resource intensive (they are only processing a few TCP packets after all), but the blocking is an absolute killer for a traditional architecture.

There are two feasible alternatives here, both of which have their merits, both of which are different approaches to cooperative multitasking. Twisted provides a single threaded model using a single select reactor internally. This is hugely efficient, although the style of programming takes some getting used to.

A valid alternative would be a stackless, lightweight cooperative multithreading environment, like Erlang or Stackless Python. If we used lightweight threads, we could run the thousands of concurrent threads we’re going to need for this application, and they would again block using virtually no resources while waiting.

Twisted has a large and very capable toolset built around it, with Nevow providing an extremely elegant and effective web framework. Nevow also provides Athena, which is some pretty cunning wiring to hook up deferreds in MochiKit to deferreds in the Twisted server, providing end-to-end asynchronicity. Very smart.

All of these factors make Twisted a very useful, and interesting, choice. Hopefully we’ll be born out by experience :)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)