Saving memory in Redis with MessagePack and Lua scripts

Rationale for compression in Redis

Redis is an in-memory data structure store used generally for caching purposes. Redis can provide very low latency as it stores its data in RAM and not on disk. Although RAM is a quite expensive and scarce resource, Redis implements LRU algorithm to mitigate this problem by discarding old data and making space for more recent/important one. Thanks to this behaviour you will usually only need few GB of memory to cache most of the data for your application. In this case increasing the size of your server memory will give you low or no performance improvement. One could ask, then what is the point to compress our data?

Although the caching use-case is the most common one for Redis, it can also be used as a message broker or a database. In this post we will focus on the latter. Using Redis as a database can be useful when your data is accessed very frequently and you need near realtime data processing.

Unlike the caching scenario, the database scenario implies that all the data is useful and needs to be persistent. Your database size is now bound by the amount of RAM you have, we cannot just discard old value from Redis. Because RAM is expensive, it would be quite interesting to introduce some kind of compression on the stored data.

In this post I will present how you can compress your data with MessagePack, how to solve race condition and how to do both at the same time using Lua scripts.

Nested structure in Redis

While Redis offers multiple built-in data structure (list, set, sorted set, hash) to easily manipulate your data, it is often useful to introduce new level of representation to allow nested structure.

Let's take for instance a hash representing a user profile with two fields:

  • name field will be the user real name
  • sessions field will be the list of his recent events
HSET username name "Adrien"  
HSET username sessions '[{"t":1455764204,"d":100},{"t":1455766103,"d":450}]'  

To represent our sessions array, we have no choice but to use an external structure like JSON. Each session has two parameters: t the timestamp and d the duration of the session.

In this example the user has an history of two sessions. One session starting at "2016-02-18 10:56:44" lasting 100 seconds and another starting at "2016-02-18 11:28:23" lasting 450 seconds.

MessagePack

JSON is a pretty easy format to read, I guess it is one of the reason it got so much traction since its introduction. However when you think about it, it is a highly inefficient representation of data. We could argue that Internet as we know it today is running in "debug mode", servers talk to each other using a pseudo human language all the time, that they need to parse and generate, while they could happily exchange some binary data.

MessagePack is a project aimed at solving this problem. It's like JSON. but fast and small. It keeps the data types of JSON but using an efficient binary representation.

JSON: '[{"t":1455764204,"d":100},{"t":1455766103,"d":450}]'  
MessagePack: 92 82 a1 74 ce 56 c5 32 ec a1 64 64 82 a1 74 ce 56 c5 3a 57 a1 64 cd 01 c2

JSON: 51 bytes  
MessagePack: 25 bytes  

Using MessagePack we are now able to save 50% of our so precious memory. The nice side effect is that we will also be able to save the same amount of bandwidth between Redis and our application.

Nested structure introduce race condition

Now that we are using our Redis server as a fast read/write database using complex nested structures, we will now face a very common database problem.

If my application is multi-threaded or is running in cluster mode, it might happen that two processes want to update some data at the same time. If I was using a Redis basic type such as a list, I could add a new element safely using LPUSH command.

Now that we are using our custom data representation, We need to get the data to our application add the element then send it back to Redis. It might introduce a race condition: if the two applications get the data approximately at the same time, update the data locally and then save it to the database, one version will overwrite the version of the other application.

race condition

Example of a race condition, also known as "shitty situation" using technical jargon. Note: application A1 could have set the data before the application A2 it would not change anything

Optimistic Locking Transaction

To solve this race condition, we can use the optimistic locking transaction algorithm by using Redis WATCH, MULTI, EXEC keywords.

With WATCH, you will indicate which keys need to be watched.
With MULTI and EXEC you will delimit your transaction commands.

update_sessions(username, session){  
  // Watch key and get its value
  WATCH username;
  sessions = HGET username "sessions";

  // Update the sessions, and set it
  sessions = add(sessions, session);
  result = MULTI, HSET username "sessions" sessions, EXEC;

  // If the transaction failed, try again.
  // A maximum number of retry is advised.
  if(result == FAILED) update_sessions(username, session);
}

optimistic locking transaction

Application A1 failed to update the sessions of our user, we just have to retry until it works !

We are now pretty happy about ourself, we were able to save some memory using MessagePack and we support concurrent update thanks to OLT! Nonetheless it feels somehow suboptimal:

  1. We need to implement this algorithm on the application side. We might have several application using different languages accessing this data. It would be better to have a common solution for all of them.
  2. In case of failure, we need to retry the transaction until it works. As the algorithm name suggests it works well in "optimistic" cases, meaning your applications can concurrently update some resources but not too often.
  3. Because we are not using the native list type of redis, we cannot use a function such as LPUSH. Therefore when updating the list, we need to get the whole list from Redis, update it and then send it back. If a session is represented by 12 bytes and we have N sessions in the list, adding one session to our list will cost us (2N+1)x12 bytes of transfer. As the sessions object will grow over time, the overall performance of our application will decrease.

Lua scripting

Since Redis 2.6 lua scripting as been introduced with the keywords EVAL and EVALSHA.

EVAL script numkeys key [key ...] arg [arg ...]  

A Redis script is transactional by definition, usually the script will be both simpler and faster than using MULTI and EXEC. Furthermore, the Redis lua interpreter loads some useful libraries such as cmsgpack or cjson. Let's see how we can write an efficient script to add one element in a hash field.

local packed_list = redis.call('HGET', KEYS[1], ARGV[1]);  
if not packed_list then packed_list = '\x90' end;  
local list = cmsgpack.unpack(packed_list);  
local element = cmsgpack.unpack(ARGV[2]);  
table.insert(list, element);  
local packed_list = cmsgpack.pack(list);  
return redis.call('HSET', KEYS[1], ARGV[1], packed_list);  

The script use HGET to retrieve the value of the hash field. If no value is found we initialize the packed list as \x90 (MessagePack for []). We unpack the list with cmsgpack library, insert the new element then save it. We can now save a new element in our list using the following command:

EVAL "INSERT SCRIPT HERE" 1 username sessions "\x82\xa1\x74\xce\x56\xc5\x3a\x57\xa1\x64\xcd\x01\xc2"  

It seems we suddenly resolved all of our problems. First, the lua script can be shared between different language, no need to implement a complex optimistic locking transaction in the different part of our stack. Second, there is no concurrency issue as the update operation is executing within the transaction. Third, we only need to send the new element of the list to redis thus saving lot of bandwidth.

Wait... we saved some bandwidth by sending only one element, but what about the script itself? If we need to send it for each update it does not seem very efficient method. Hopefully we can solve this problem by using EVALSHA. Instead of taking a script as argument, EVALSHA will take the sha1 hash value of the script.

EVALSHA "8d794e99f6e3523ec264e6563728b86910d4d1ea" 1 username sessions "\x82\xa1\x74\xce\x56\xc5\x3a\x57\xa1\x64\xcd\x01\xc2"  

To load a script, Redis provide the SCRIPT LOAD command. After executing, this command will return the sha1 hash value of the script loaded.

SCRIPT LOAD "INSERT SCRIPT HERE"  
"8d794e99f6e3523ec264e6563728b86910d4d1ea"

The question now is, when should we load the script into Redis? Because EVALSHA command will return you a specific error when the script is not found, we can use EVALSHA in an optimistic manner and load the script only if the specific error is raised.

execute_script(script, hash, key, parameter){  
  result = EVALSHA hash 1 key parameter
  if(result == NOSCRIPT){
    SCRIPT LOAD script
    result = EVALSHA hash 1 key parameter
  }
}

In 99.99% of the case the if statement will never be executed. Loading the script at runtime will give us the flexibility to update our script without having deployment headaches.

To finish, only part of the EVAL command capability has been covered here. For more details go on Redis official website.