Thursday, March 13, 2008

Cache Everything

So I'm working on this Web Service project and its time to consider scalability. Which means its time to setup memcached. If you haven't looked at using memcached, you should.

Memcached is a totally stupid-simple distributed memory object cache. It's this super tiny application that slices off a chunk of RAM and listens on a port. Yeah, that's about it :)

It gets cool to talk about when you start talking about making it "distributed". When you start talking about "Distributed Caches" people start thinking about Coherence and other hugely complicated and hugely expensive enterprise applications. It really dosen't have to be so hard.

Here's what you do. Get a couple cheap boxes with one or two gigs of ram in them. Install the Linux flavor of your choise. Install memcached and run it with "memcached -d -m 1024". This will start memcached on it's default port (11211) and allocate 1gb of ram to the cache. If you do this on three boxes, you now have a 3gb distributed cache.

There are client libraries available for most popular languages. If you're using Java I'd highly recommend using spymemcached from Dustin Sallings.

So you give the clients a list of servers to connect to and they will basically treat that big distributed cache you created as a huge hashtable. Each get/set operation has a string key that is hashed and used to decide what "partition" of the cache to read/write to. If an instance goes down the client just begins distributing reads/writes to the other instances until the missing member comes back.

Using any of the client libraries, in just about any language, your usage looks like this:

  • Check the cache for your object.

  • If your object is not null, return the object.

  • Otherwise, read the object from the database or what have you.

  • Put the object in the cache.

  • Return the object.



Memcached uses a very light and fast binary protocol which makes it usable by just about any language. There are pre-built packages in apt if you're using Ubuntu, you can easily build it from source for just about any other *nux flavor. There's even a windows port. I've only used that for dev at this point though.

I'm working on writing a second level cache for hibernate that uses memcached. It is super simple, but I just started work on it last night (seriously). Check it out if you want. I plan to work on it over the next few nights and get it ready for prime time :)

2 comments:

Anonymous said...

- memcached does not use a binary protocol, it's ascii commands with arbitrary data
- if you lose a node, you lose the data which is a fundamental difference with coherence though there is work for adding re-distributions to memcached through clients with proprietary functionality

all in all though it is a killer application

Ray Krueger said...

Sorry, there is a binary protocol available, but by default most folks use the ascii protocol, you are correct. They are both very fast and efficient though.

Losing a node results in a cache-miss for the data that node was holding. In most client applications this just results in a cache-set on a different node (as a result of the hash algorithm result changing) and cache-gets from that same node.

Luckily nodes going down at random doesn't happen very often :)