Why is SQLite faster than Redis in this simple benchmark?

26,026

Solution 1

from the redis documentation

Redis is a server: all commands involve network or IPC roundtrips. It is meaningless to compare it to embedded data stores such as SQLite, Berkeley DB, Tokyo/Kyoto Cabinet, etc ... because the cost of most operations is precisely dominated by network/protocol management.

Which does make sense though it's an acknowledgement of speed issues in certain cases. Redis might perform a lot better than sqlite under multiples of parallel access for instance.

The right tool for the right job, sometimes it'll be redis other times sqlite other times something totally different. If this speed test is a proper showing of what your app will realistically do then sqlite will serve you better and it's good that you did this benchmark.

Solution 2

The current answers provide insight as to why Redis loses this particular benchmark, i.e. network overhead generated by every command executed against the server, however no attempt has been made to refactor the benchmark code to accelerate Redis performance.

The problem with your code lies here:

for key in data:
    r.set(key, data[key])

You incur 100,000 round-trips to the Redis server, resulting in great I/O overhead.

This is totally unnecessary as Redis provides "batch" like functionality for certain commands, so for SET there is MSET, so you can refactor the above to:

r.mset(data)

From 100,000 server trips down to 1. You simply pass the Python dictionary as a single argument and Redis will atomically apply the update on the server.

This will make all the difference in your particular benchmark, you should see Redis perform at least on par with SQLite.

Solution 3

SQLite is very fast, and you're only requiring one IO action (on the commit). Redis is doing significantly more IO since it's over the network. A more apples-to-apples comparison would involve a relational database accessed over a network (like MySQL or PostgreSQL).

You should also keep in mind that SQLite has been around for a long time and is very highly optimized. It's limited by ACID compliance, but you can actually turn that off (as some NoSQL solutions do), and get it even faster.

Solution 4

Just noticed that you did not pipeline the commit for redis. Using piplines the time reduces:

[---Testing SQLITE---]

[Total time of sql: 0.669369935989]

[---Testing REDIS---]

[Total time of redis: 2.39369487762]

Share:
26,026
torayeff
Author by

torayeff

Updated on August 15, 2022

Comments

  • torayeff
    torayeff over 1 year

    I have done simple performance test on my local machine, this is python script:

    import redis
    import sqlite3
    import time
    
    data = {}
    N = 100000
    
    for i in xrange(N):
        key = "key-"+str(i)
        value = "value-"+str(i)
        data[key] = value
    
    r = redis.Redis("localhost", db=1)
    s = sqlite3.connect("testDB")
    cs = s.cursor()
    
    try:
        cs.execute("CREATE TABLE testTable(key VARCHAR(256), value TEXT)")
    except Exception as excp:
        print str(excp)
        cs.execute("DROP TABLE testTable")
        cs.execute("CREATE TABLE testTable(key VARCHAR(256), value TEXT)")
    
    print "[---Testing SQLITE---]"
    sts = time.time()
    for key in data:
        cs.execute("INSERT INTO testTable VALUES(?,?)", (key, data[key]))
        #s.commit()
    s.commit()
    ste = time.time()
    print "[Total time of sql: %s]"%str(ste-sts)
    
    print "[---Testing REDIS---]"
    rts = time.time()
    r.flushdb()# for empty db
    for key in data:
        r.set(key, data[key])
    rte = time.time()
    print "[Total time of redis: %s]"%str(rte-rts)
    

    I expected redis to perform faster, but the result shows that it much more slower:

    [---Testing SQLITE---]
    [Total time of sql: 0.615846157074]
    [---Testing REDIS---]
    [Total time of redis: 10.9668009281]
    

    So, the redis is memory based, what about sqlite? Why redis is so slow? When I need to use redis and when I need to use sqlite?

  • swasheck
    swasheck almost 12 years
    That's fair, but the overhead should be minimal since it's connecting on localhost. At least less overhead than across a network.
  • Voo
    Voo almost 12 years
    +1, although I certainly don't agree with the quote: I'm generally not interested how something works (ok I am, but not when benchmarking), but how fast it is for the job at hand - if one thing's noticeably slower because of some architectural decisions, that still doesn't make the comparison "meaningless"
  • Brendan Long
    Brendan Long almost 12 years
    @swasheck Yes it's not nearly as bad as connecting to another machine, but it still involves system calls and more complicated communication (compared to just using your own processes's memory directly).
  • torayeff
    torayeff almost 12 years
    How to be if I want to check url-seen in web crawler and at the same time update database?
  • Admin
    Admin almost 12 years
    @torayeff Concurrent updates is actually where SQLites "Archilles heel" due to how the locking model works (it does not scale with many contending writers). Of course that is not stressed/tested at all the benchmark used and just a single "web crawler" is hardly adding much contention so...
  • Admin
    Admin almost 12 years
    Disabling "ACID" (e.g. flush settings) doesn't speed up SQLite much for reasonable transaction sizes... it's only the commit that is "really really important to remember". (Although there are other issues at play to determine transaction visibility.)
  • swasheck
    swasheck almost 12 years
    @pst those are both very good points which also serve to reinforce the need to truly know your project and select your tools appropriately.
  • Didier Spezia
    Didier Spezia almost 12 years
    I'm the original author of this quote, and I do not agree with your disagreement ;-) Benchmarking is comparing apples to apples, so you need to understand what an apple is to assess its performance.
  • noj
    noj over 9 years
    This is a true comparison.
  • kabirbaidhya
    kabirbaidhya almost 7 years
    Good point. But if you replace the loop of r.set to one single bulk operation using r.mset then on the sqlite end you'll also need to replace the loop of multiple INSERT statements to one single bulk INSERT. IMO, only then it would be a true reliable benchmark that compares bulk-vs-bulk operations on the both ends.
  • ChaimG
    ChaimG over 3 years
    +1 for showing that sqlite is still faster after pipelining. BTW, you can make sqlite even faster as well by doing bulk inserts.
  • DIGI Byte
    DIGI Byte over 2 years
    Apples to Oranges, a fruit is still a fruit - you have a goal to achieve. they are comparable. Everything else of how that goal is achieved is semantics to the core objective. the question was in relation to performance and speed in one solution to another. if you look too close to the details, or apples to apples as you say, the difference is almost negligible with on par systems.
  • Justin Furuness
    Justin Furuness almost 2 years
    The "turn that off" link is broken
  • PowerAktar
    PowerAktar almost 2 years
    @kabirbaidhya I’ll second that. By using batch processing on Redis, you are creating an unfair benchmark.