16 7 / 2012

 In my previous post I’ve shown how can you squeeze more out of your datastores by using compression. Similar rules apply to bandwidth and data transmission over the wire. There are various serialization formats you can choose from Protobuf, Thrift, Avro just to name few. I always prefer schema free serialization formats in case of key/value stores, this gives me error free data loading even when your data changes over time. The most favorite format for structure free schema is obviously JSON. However the new kind on the block is MsgPack. I’ve been playing around for a while with MsgPack in few projects of mine and it is actually good, compact, well documented and supports lots of platforms. But how does it compare to JSON (not in terms of performance or speed)? We all have a huge trust on JSON, its already a superhero who slayed XML despite its weaknesses. In this little adventure I set out to compare JSON vs MsgPack in terms of bytes when compressed! Lets get straight down to the business, here is the source code I used:

  I am simply loading about 200 random tweets, then encoding those tweets to JSON, MsgPack, with Gzip and LZ4 compression. Results are pretty disturbing in case of GZip:

Now LZ4 looks quite normal and just as we expected with but GZIP just in 200 tweets MsgPack takes 189057 bytes and JSON takes only 177976 bytes. Bingo! now this is what I call a smart combination. You get 2 standard components that’s not only available for native applications you can write; but they are also available in modern browsers you are using today! You can use them in Javascript too with no special decoder to load data and simply use it. Now some of you may be wondering what’s the big deal? Here is the deal, if you can detect your browser supports GZIP Content-Encoding over XHR, you can serve gzipped JSON directly out of the data store to your clients (i.e. no fetch encode to JSON and stream Gzipped). You can use similar technique for cache systems like NGINX + Memcache [using HttpMemcachedModule] to serve some of your REST calls really quick (user profile, user info, etc).