How is aerospike different from other key-value nosql databases?

35,742

Solution 1

If it has to be answered in one word, its "performance". Aerospike's performance is much better than any clustered-nosql solutions out there. Higher performance per-node means smaller cluster which is lower TCO (Total Cost of Ownership) and maintenance. Aerospike does auto-clustering, auto-sharding, auto-rebalancing (when cluster state changes) most of which needs manual steps in other databases.

I said "clustered" because I dont want to mix redis in that group (though redis clustering is in beta). Pure in-memory performance of Aerospike and redis will be comparable. But Redis expects a lot of things to be handled at the application layer like sharding, request redirection etc. Even though redis has a way to persist (snapshot or AOF), it has its own problems as its designed more like an addon. Aerospike is developed natively with persistence in mind. The clustering of redis also involves setting up master slave etc. You may want to take a look at this talk comparing and contrasting redis vs aerospike.

Solution 2

I have used Redis for years, and have just started using Aerospike as a substitute for it by many reasons.

Although Redis and Aerospike both have great performance, Redis main problem is that it only stores data in-memory and does not have a clustering solution officially released. It limits the size of your database to the RAM size of your server, while Aerospike can be configured to use SSD to store information with no speed loss at all. Aerospike latency is incredibly low, even with high read/write throughput.

Aerospike best suits me because it is able to scale with performance and with no hard work, and different than Redis it is also designed to persist your data completely, minimizing data loss in any event. They have released a great video showing how easy it is to scale and manage a Aerospike Cluster, and also how it automatically configures itself even when there's a "disaster" situation.

Solution 3

Surprisingly Redis one of the most popular in-memory databases did not have auto sharding 3 months ago. They recently added this feature. Redi 3.0 has auto sharding.

AS Supports auto clustering by having fast restart feature, where all the indexes are persisted without adding to the throughput and the database can be brought up in a couple of mins (e.g; db size of 50 TBs can be brought up in a couple of mins.). All of this can be achieved on commodity hardware. Adding capacity is nothing but adding a new node to the cluster. Works across data centers & cloud environments most importantly it works for any local environment.

Supports online match making (managing demand and supply).

No-SQL database has to deal with real time use cases to meet aggressive SLAs’ needed by todays’ Advertising world, Online shopping portals, logistic service providers such as OLA cab (Identifying the nearest cab who is ready for a pick up and can reach the customer under 5 mins is computed in under <3 ms), online bidding applications(99.7% accuracy in finalizing a AdBid in <3 ms), Fraud Detection systems and so on which needs to identify a malicious user in <5ms(miliseconds).

  • Aerospike is record level ACID compliant which is true for most of the No-SQL databases.
  • Aerospike is designed for clustered environment,
  • Built for horizontal scaling,
  • Supports data balancing(Automic/Manual),
  • Auto sharding – application level or transparent to end user.

Aerospike is Open source real-time, no-SQL and key-value store. Built in C from the scratch because then there are ways in which the db is written to take advantage of the hardware, networking, SSD, memory and Kernel. Optimized for SSD/Flash storage the reason being that SSD are the future of storage devices at the same time it works on HDD (rotational disk drives) SSD provides parallel channels depending upon the SSD provider who may choose to use 8,16 32 and so on. SSD have a wear and tear to it if the same block location is written to and erased from. In case of SSD you write in terms of blocks, SSD is used as a no file system as a block store and used as a ring buffer meaning you write at the ring buffer start and keep adding data to the next , next , next until the end of the drive. Once you reach the end you come back to the first location of the block and then carry on in the same fashion which ensures that the 1st location will be used not the most number of times but equal amount of time.

Clustering or lets call it Auto Clustering. Add a node and bring it within cluster happens in <100 ms. It is implemented using Paxos Algorithm.

What is is Paxos algorithm?

http://www.quora.com/Distributed-Systems/What-is-a-simple-explanation-of-the-Paxos-algorithm

RIPE160MD# which provides 20 bytes 160bit # it is guaranteed to be unique and

The # is normal 4K distribution,

Every namespace maintains its partition trees, every namespace has a partition ID, every partition has a b-tree.

Storage Model

In memory database: everything is stored in DRAM effectively high performance and high cost involved.

Disk Storage: Primary and 2dary indexes stored in DRAM, Data goes on SSD or HDD. Which means optimum using SSD but slightly slower than DRAM but atleast ~10X cheaper than DRAM.

Hybrid Storage: Everything stored in DRAM. Data persisted on SSD or HDD. DRAM performance backed by SSD or HDD persistence. Higher DRAM cost without losing out on performance.

Benchmark

1.6 million TPS with YCSB(yahoo cloud source benchmark) on 4 node, in-memory.

SSD performance guarantee given by Aerospike:

ACT (Aerospike Compliance test): It is defined and developed to test SSD performance. Today it is the std or certification for SSD. Intel did a blog post stating that they are the only SSD providers in the world who support 1 million TPS using ACT.

Google cloud has done some work to display the throughput of google compute engine. Google posted on their blog what Cassandra takes 300 nodes to produce, what AS does it with 50 nodes.

Aerospike deals with realtime problems in a very effective manner.

Solution 4

Lynn Langit just released a very detailed head-to-head benchmark of Aerospike vs. Redis running in different configurations on AWS cloud. Her summary: "TL; DR – at scale Aerospike wins".

How she reached that conclusion is very interesting. She gives step-by-step instructions on how she produced her results for others looking to gain insight on how to do their own performance benchmarks. The tests had to be set up as a pure RAM datastore, as well as for SSD persistent database.

Her key observations:

  • Aerospike is as fast as Redis with close to 1 MTPS for 100% read workloads on a single node on AWS R3.8xlarge with no persistence.
  • Aerospike is slightly faster than Redis for 100/0 and 80/20 read/write workloads against a single node backed by EBS SSD (gp2) storage for persistence.

Solution 5

When you account for failover and the way Aerospike self heals when you yank the power plug out of any rack in the data center, remaining performance at a million read ops per second per node with no traffic coordinator, so that you are always maxed out at the switch or other hardware (unless you are map reducing aggregates) I mean nothing else comes close to self balanced real time dynamic analytics with secure data. All the other platforms require you to hybridize to get all your attributes right in the CAP triangle. With no buffering or queues, no cache for data, ghosting is no longer a category. So many benefits on top of being the best performer. We just need to admit it. Aerospike is deliciously ridiculous!

Share:
35,742
Salvador Dali
Author by

Salvador Dali

I am a Software Engineer in the Google Search Growth team. I use Tensorflow and TFX to analyze search data and Go to write data pipelines. This is my personal profile which has absolutely nothing to do with my employer.

Updated on July 09, 2022

Comments

  • Salvador Dali
    Salvador Dali almost 2 years

    Aerospike is a key-value, in-memory, operational NoSQL database with ACID properties which support complex objects and easy to scale. But I have already used something which does absolutely the same.

    Redis is also a key-value, in-memory (but persistent to disk) NoSQL database. It also support different complex objects. But in comparison to Aerospike, Redis was in use for a lot of time, already have an active community and a lot of projects developed in it.

    So what is the difference between aerospike and other no-sql key-value databases like redis. Is there a particular place which is better suited for aerospike.

    P.S. I am looking for an answer from people who used at least one of these dbs (preferably both) in real world and havend real life experience (not copy-pastes from official website).

  • skan
    skan over 9 years
    How does it compare to Hyperdex, DynamoDB, FoundationDB, Hibari, VoltDB, MonetDB...?
  • sunil
    sunil over 9 years
    It is not officially compared with the databases that you requested, but its compared with couch, cassandra and mongodb. You may refer to aerospike.com/benchmark. Now that Aerospike has YCSB plugin (github.com/aerospike/ycsb), you can run the test yourself and see.
  • Itamar Haber
    Itamar Haber over 9 years
    @antirez's reply is at: antirez.com/news/85
  • Itamar Haber
    Itamar Haber over 9 years
    And our reply is at: redislabs.com/blog/…
  • Richard Grossman
    Richard Grossman about 9 years
    What are talking about, I can't understand there is a lot of word but all together = nothing !!!!
  • Michael
    Michael about 9 years
    Sorry Richard, I used some excited Architect speak. Visit the AeroSpike website and look at their documentation. For NoSQL in data centers nothing beats it. I also wrote another post on NoSQL on a more basic level that includes some discussion of AeroSpike. linkedin.com/pulse/…
  • Manuel Arwed Schmidt
    Manuel Arwed Schmidt almost 9 years
    Note that Couchbase achieves compareable results in hdd-persisted scenarios where you have a 'hot dataset' that fits in memory. Ofc AS don't want to let you know and some benchmarks out there seem to be unfair. Instead of showing CB's performance, they simply say it failed to complete the test. I would always evaluate both of them and decide for one based on the use case.
  • kapad
    kapad over 5 years
    this answer is outdated as of 2018-10-26. (parts of it may still be relevant, but for one, redis clustering is not out of beta).