Fastest, non-memory-based, multi-process key-value store for Node.js

25,351

Solution 1

I would suggest to have a look at LMDB (which is the most efficient engine for OpenLDAP, and used in a number of other open-source projects).

LMDB is an embedded key/value store, with a Berkeley-DB or LevelDB like API, does not have to store everything in memory, and can support access from multiple processes. There are Node.js bindings:

Solution 2

You can try ssdb, a redis protocol compatible database built upon leveldb.

https://github.com/ideawu/ssdb

You can use the existing node-redis client, though some of the commands may vary.

benchmarks:

                  Redis (100.000x)
      13,540 op/s ⨠ set small
      13,289 op/s ⨠ set medium
      13,279 op/s ⨠ set large
      13,651 op/s ⨠ get large
      13,681 op/s ⨠ get medium
      14,428 op/s ⨠ get small

                  SSDB (100.000x)
      12,252 op/s ⨠ set small
      11,824 op/s ⨠ set medium
      11,720 op/s ⨠ set large
      13,810 op/s ⨠ get large
      13,593 op/s ⨠ get medium
      12,696 op/s ⨠ get small


                  lmdb (100.000x)
       4,616 op/s ⨠ set small
      11,104 op/s ⨠ set medium
      17,283 op/s ⨠ set large
      13,778 op/s ⨠ get large
      16,002 op/s ⨠ get medium
      50,562 op/s ⨠ get small

                  multilevel (100.000x)
       6,124 op/s ⨠ set small
       5,900 op/s ⨠ set medium
       5,944 op/s ⨠ set large
       6,215 op/s ⨠ get large
       6,125 op/s ⨠ get medium
       6,310 op/s ⨠ get small

As you can see, ssdb is almost as fast as redis, and it is designed for persistent storage. lmdb @didier-spezia mentioned is ultra fast for getting small data, but setting one is slow.

Solution 3

There is FaceBook's RocksDB that is supposed to be fast (especially on SSD storage), and there are also others such as LMDB (already mentioned) and WiredTiger

You mentioned Redis - If you'd like to use the Redis API but have one of the above Key/Value databases as the storage instead of your RAM, there are two projects I know of (though haven't tested them): LedisDB (written in Go) and ardb (written in C++).

I've recently started testing what seems like a very promising though yet less known (though I'm sure that will change) key value database library named CuttDB. It has very fast performance and built to handle large amounts of data on the HDD. It even includes a Memcached server interface.

Solution 4

The problem you are going to run into is that "lightning fast" and disk don't mix especially if you have random access reads as you do in a key-value system. You need to get as much data into memory as possible since reading from memory is many magnitudes faster than reading from disk.

Is the reason you want to minimize memory because this will be an embedded database? If so, you might want to look at Empress - http://www.empress.com. Have used it in a couple of projects and you can configure how much gets loaded. However, its got the overhead of an RDBMS so not sure it will be as lean as you want.

You might also consider MySQL with the Memcache addon. This allows you to use MySQL as a key value store. Much much faster than regular MySQL since you skip the SQL layer processing. Also, with MySQL, you can turn the knobs to play with how much memory is used.

Firebird is another low memory usage db - http://www.firebirdnews.org/docs/fb2min.html.

Anyway, hope this helps. Without a more indepth explanation of your needs (is this embedded, why the need to save memory and if memory is precious what do you consider low memory consumption, do you need acid, redundancy, what do you consider lightning fast, etc.) its difficult to provide more of an analysis.

Solution 5

Why don't you use MySQL(or MariaDB) with Master-slave replication. Based on your requirements. MySql's master-slave architecture is fit for you.

Basically, NoSQL need a lot of server. For example, MongoDB's minimal setting needs three server, HBase needs four server.

In this point of view, If you need more readability then add a new slave server on mysql architecture.

We assume that mysql's read performance is 2k tps. Then four node of mysql's read performance is 8k tps.

It depends on your test result and service usage(read/write ratio).

check below link, that is "Marco Cecconi - The Architecture of StackOverflow". http://www.youtube.com/watch?v=t6kM2EM6so4

Share:
25,351
Ruben Verborgh
Author by

Ruben Verborgh

I’m a professor of Semantic Web technology at IDLab, Ghent University – imec, and a research affiliate at the Decentralized Information Group of CSAIL at MIT. I’m also a technology advocate for Inrupt, supporting the Solid ecosystem that gives you back control and choice—online and offline. I love discussing about the Web, Linked Data, decentralization, Web APIs, hypermedia clients, and much more.

Updated on August 17, 2020

Comments

  • Ruben Verborgh
    Ruben Verborgh over 3 years

    What is the fastest non-memory key-value store for Node.js supporting multiple processes?

    I need to store simple key-value string/string pairs (not documents or JSON, just strings).
    Here are some examples (there would be millions of those):

    • 12345678 – abcdefghijklmnopabcdefghijklmnop
    • 86358098 – ahijklmnopbcdefgahijklmnopbcdefg
    • abcdefghijklmnopabcdefghijklmnop - 12345678
    • ahijklmnopbcdefgahijklmnopbcdefg - 86358098

    I have tried:

    • Redis: it's really fast and does everything I need, but consumes too much RAM.
    • LevelDB: it's fast and not too heavy on RAM, but only single-process.

    A workaround for LevelDB is multilevel, which exposes a single LevelDB process though HTTP.
    But that of course comes at a cost; I need something fast.

    Is there any key-value store that:

    • supports Node.js or has bindings for it;
    • stores string/string pairs;
    • supports multiple processes;
    • does not entirely reside in memory;
    • is fast?

    I only care about reading. Fast multi-process reading is necessary, but not writing.
    I'm happy with the current speed of LevelDB, just not with the fact that it is single-process.


    Additional details:

    • I'm talking about some 50 million key/value pairs, with keys and values between 8 and 500 chars.
    • The code will run on a regular Linux server.
    • Memory usage should be limited to a few gigabytes (4GB is fine, 8GB is acceptable)
    • Reading will happen way more than writing; actually, I could do without writing.
    • Speed is more important than anything (given memory and multi-process constraint are respected).
  • Ruben Verborgh
    Ruben Verborgh over 10 years
    Well, “lightning fast” was probably exaggerated (removed it from the question). I'm happy with LevelDB, but it's too bad LevelDB is only single-process. The reason I want to minimize, is because we're talking about millions of entries, some of which are long. I have tried MySQL as key/value before and it was painfully slow, but I never tried Memcache. Will try that now, but will also update the question with more details.
  • Pavel S.
    Pavel S. over 10 years
    Isn't Memcache also just another in-memory storage?
  • Ruben Verborgh
    Ruben Verborgh over 10 years
    This seems very relevant. I will check it out!
  • AlexGad
    AlexGad over 10 years
    Actually referring to the InnoDB memcache plugin. It allows MySQL to operate efficiently as a key-value store. It does not make it that much more efficient for reads so in retrospect not sure it'll meet the needs of the question if the requirement is for absolutely blazing speed.
  • Ruben Verborgh
    Ruben Verborgh over 10 years
    I've tried MySQL with a 10 million key/value table, but reading values was painfully slow (even if the key was just numeric), even with proper indexes. Note that I don't need replication.
  • yinqiwen
    yinqiwen about 10 years
    Is the 'get' in benchmark test using enough random keys? Since Leveldb is disk based, the 'get' performance would be much more slower if every 'get' operation would seek in disk.
  • yinqiwen
    yinqiwen about 10 years
    And for lmdb, since it's using mmap, if the machine's memory is large enough, all data would be in memory that never swap out.
  • Polor Beer
    Polor Beer about 10 years
    the keys are all serial numbers. I'll try implement it with random keys. And may i ask what do you mean by 'never swap out'?
  • Polor Beer
    Polor Beer about 10 years
    github.com/ktmud/multilevel-bench/tree/master/reports here's the results for random keys, I don't see much difference, though it might because that i wrote test cases badly.
  • yinqiwen
    yinqiwen about 10 years
    30000 iteration for each bench is not enough. And all data set in the bench have only 90000 keys which is too small for a benchmark testing. leveldb/lmdb may cache all data in memory in your bench. (For leveldb,all data may still in memtable after 'set' bench, which is still in memory). I suggest for each bench test, it should iterate at least 10,000,000 times.
  • yinqiwen
    yinqiwen about 10 years
    Since leveldb/lmdb is using a tree like structure to store data, the complexity is O(log n), that means in theory more data stored, it's more slow. While redis using hash map to store data in memory, it has a O(1) complexity.
  • Polor Beer
    Polor Beer about 10 years
    Thanks for the explanation, it's really helpful
  • JP Richardson
    JP Richardson almost 10 years
    @RubenVerborgh did LMDB work alright? Did you ultimately choose LMDB or something else?
  • hyc
    hyc over 9 years
    Once the hash map and data get large enough, it is no longer O(1), perf will drop off a cliff and be much worse than O(log n). Hashes are extremely inefficient space-wise and cache-unfriendly.