Sunday, March 09, 2008

ICP for Faster Web Apps

ICP (Internet Cache Protocol, RFC 2186) is a simple protocol intended to let one cache server ask it's peers whether or not they have a non-stale copy of a particular object ("object" is cache-speak for the thing on the other end of a URL), but it can also be used to make the most of your web apps.

The RFC is pretty easy reading as such things go, but I'll summarize here: if a cache server doesn't have a local copy of the requested object it can be configured to send out a request (usually over UDP) to all of it's siblings. The request includes little more than the requested URL. All the sibling caches send back HIT, MISS, or MISS_NOFETCH (more or less, read the RFC for the details). Here are the definitions of those from the RFC:

ICP_OP_HIT

An ICP_OP_HIT response indicates that the requested URL exists in this cache and that the requester is allowed to retrieve it.

ICP_OP_MISS

An ICP_OP_MISS response indicates that the requested URL does not exist in this cache. The querying cache may still choose to fetch the URL from the replying cache.

ICP_OP_MISS_NOFETCH

An ICP_OP_MISS_NOFETCH response indicates that this cache is up, but is in a state where it does not want to handle cache misses.
The first server to come back with a HIT will be sent the request, if all servers reply with a MISS, the first server to respond with a MISS will get the request. Servers that send back a MISS_NOFETCH will not be sent this request. The response to any particular request has no bearing on any future request; for example, a server can reply with a MISS_NOFETCH for a URL one moment and a HIT for the same URL a split second later.

This is all very nice for cache servers, but how does this help speed up my dynamic web app? Well, forward proxies can also use ICP to query "origin servers" (non-cache web servers). Why is that interesting? Lets say you have multiple origin servers, being good app servers they each have local caches (for data they request from databases, computed page fragments, etc.). If requests are randomly sent to the various origin servers, the app server cache hit rate will be pretty abysmal. The negative affects are intensified if there are many back-end servers and if the working data set is much larger than what will fit in the caches, meaning you get a high rate of cache churn; otherwise useful data being evicted because there's not enough room to keep it.

It's likely that the total cache size is larger than the working set, but since the requests are randomly distributed, poor use is made of the caches. It would be nice if a request that needed a particular subset of the data went to a server that was likely to have the needed data already cached. In other words, you want some flavor of "request affinity".

How you decide to affinitize requests depends on how your data is structured. For the systems I'm working on, we're serving hundreds of web sites out of a single large database, so site affinity makes the most sense. This means that all requests for a particular site should tend to go to the same server or set of servers (if there is more load than a single origin server can handle).

The sequence of events goes something like this: a request comes in, an ICP request is sent to all the origin servers and the results are used to direct the request. If a server has handled a request for that site before, it sends back a HIT, if not, then a MISS.

If a server would otherwise send a HIT but is too busy at the moment, it'll send a MISS instead, hoping that another server HITs. If none do, then the fastest MISS will get the request, which may mean that a new server gains an affinity for that site.

When used like this ICP can provide dynamic load balancing and data set partitioning in a simple, scalable fashion. If you want to play with ICP you'll need a forward proxy that supports ICP, like Squid and an ICP server like my Python implementation: zc.icp (developed at Zope Corporation, but not Zope-specific).