Filtering Redis Hash Entries

10,622

Solution 1

Redis is a lock-free non-blocking async server so there is no added contention when using pipelining. Redis hums along happily processing each operation as soon as it receives them so in practice can process multiple pipelined operations. In essence redis-server really doesn't care if the operation is pipelined or not it just processes each operation as it receives them.

The benefit of pipelining is to reduce client latency where instead of waiting for a response from redis-server for each operation before sending the next one, the client can just pump all operations at once in a single write then read back all the responses in a single read.

An example of this in action is in my Redis mini StackOverflow clone each click makes a call to ToQuestionResults() which because the operations are pipelined sends all operations on 1 Socket write call and reads the results in 1 Socket blocking read which is more efficient instead of a blocking read per call:

https://github.com/ServiceStack/ServiceStack.Examples/blob/master/src/RedisStackOverflow/RedisStackOverflow.ServiceInterface/IRepository.cs#L180

My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command.

This is not a valid concern and I wouldn't over think how Redis works here, assume it's doing it the most efficiently where Pipelining doesn't block processing of other clients commands. Conceptually you can think that redis-server processes each command (pipelined or not) in FIFO order (i.e. no time is wasted in waiting/reading the entire pipeline).

You're describing something closer to MULTI/EXEC (i.e. Redis Transactions) where all operations are done at once as soon as Redis server reads EXEC (i.e. EOF Transaction). This is not a problem either and redis-server still doesn't waste any time waiting to receive your entire transaction, it just queues the partial commandset in a temporary queue until it receives the final EXEC which is then processed all at once.

This is how redis achieves atomicity by processing each command, one at a time, as soon as it receives them. Since there are no other threads, there is no thread context switching, no locks and no multi-threading issues. It basically achieves concurrency by processing each command really fast.

So in this case I would use Pipelining as it's always a win, more so the more commands you pipeline (as you reduce the blocking read count).

Solution 2

Individual operations do block, but it doesn't matter as they shouldn't be long running. It sounds like you are retrieving more information than you really need - HGETALL will return 100,000 items when you only need 500.

Sending 500 HGET operations may work (assuming the set stores both hash and key) though it's possible that using hashes at all is a case of premature optimization - you may be better off using regular keys and MGET.

Solution 3

I think you misunderstand what pipelining does. It doesn't block while all the commands are being sent. All it's doing is BUFFERING the commands, then executing them all at once at the end, so they are executed as if they are one single command. At no time is blocking occurring. The same is true for redis multi/exec. The closest thing you get to blocking/locking in redis is optimistic locking by using the watch, which will cause exec to fail if the redis key has been written to since you called watch.

Even more efficient that calling hget 500 times within a pipeline block is to just call hmget('hash-key',*keys) where keys is an array of the 500 hash keys you are looking up. This will result in a single call to redis, which is the same as if it was pipelined, but should be faster to execute since you aren't looping in ruby.

Share:
10,622

Related videos on Youtube

Paul
Author by

Paul

Updated on June 15, 2020

Comments

  • Paul
    Paul over 3 years

    I'm using redis to store hashes with ~100k records per hash. I want to implement filtering (faceting) the records within a given hash. Note a hash entry can belong to n filters.

    After reading this and this it looks like I should:

    1. Implement a sorted SET per filter. The values within the SET correspond to the keys within a HASH.
    2. Retrieve the HASH keys from the given filter SET.
    3. Once I have the HASH keys from the SET fetch the corresponding entries from the HASH. This should give me all entries that belong to the filter.

    Firstly is the above approach correct at a high level?

    Assuming the approach is OK the bit I'm missing is what's the most efficient implementation to retrieve the HASH entries? Am I right in thinking once I have the HASH keys I should then use a PIPELINE to queue multiple HGETALL commands passing through each HASH key? Is there a better approach?

    My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command. I'll be paging the filtered results with 500 results per page. With multiple browser based clients performing filtering, not to mention the back end processes that populate the SETs and HASHes it sounds like there's potential for a lot of contention if PIPELINE does block. Could anyone provide a view on this?

    If it helps I'm using 2.2.4 redis, predis for the web clients and servicestack for the back end.

    Thanks, Paul

  • Paul
    Paul over 12 years
    Thanks for replying Tom. You're correct, I'd misunderstood the purpose of HGETALL. Whilst your answer was useful I'm not going accept it since I don't feel it really gets me any closer to the original question. I hear what you're saying about premature optimization but it seems that sorted sets are the accepted way of implementing filtering and hashes are the optimal way of storing "objects". I feel I'm only following best practice rather than doing anything unusual.

Related