Fast Database to store 1 Billion records

15,719

Solution 1

This is not the kind of thing anyone can 100% 'nail' in 5 minutes but if you want to achieve that level of performance and functionality consistently then I strongly suspect you're going to have to spend money.

If you have budget for this then I'd look to combine Oracle's 'TimesTen' engine, possibly with either enterprise SSD or PCIe Flash (such as FusionIO) and if your going that far them I'd recommend going with Intel's Xeon E7-xxxx series as they have fantastic RRAS capabilities to keep your server up and running as long as possible in the event of failures.

Solution 2

If getting more features and beter performance was simply a matter of using different software, then everyone would be doing it (at least the few people still wotking in IT ;)

If you need ACID support - then why try mongoDB?

It's quite possible to get the throughput you want with innodb (but myisam is faster) - provided you spend enough on hardware and tune the system properly. But that your first constraint is to minimise space usage, this rather implies that you don't have the budget to go and buy a SSD RAID set.

can cause data loss on crashes

I've seen data losses on crashes in Oracle and Sybase - but yes, in theory, these should be more resillient. I would usually spend some time, effort and money on avoiding the crash in the first place (dedicated server, using reasonable quality hardware, UPS, stable software).

Even SQLITE3 DB is VERY VERY SLOW

SQLite is particularly sensitive to the underlying filesystem (but this will affect all databases). That you don't mention what filesystem (or even which Operating System!) this is running on makes me think you've not spent a lot of time looking at tuning the installation.

If it were me I'd have a look at MariaDB - it's a fork of mysql and includes an improved ISAM engine (Aria) which has some performance benefits and supports transactions. (running on linux, using xfs with nobarriers and the deadline scheduler).

But you'll need to look at your hardware to really get it flying.

Solution 3

From my experience InnoDB (if properly configured and used) is only a bit slower than MyISAM (certainly not an order of magnitude). A couple of suggestions come to my mind to improve performance:

  • You should be using the latest versions of MySQL InnoDB storage engine.
  • You should be using a large buffer pool size (~80% available RAM) and a large log file size.
  • You should be using transactions and batch inserts to get a high throughput. Batch size is an important performance factor (not too small but not too large either).
  • You should be using AUTOINCREMENT id as PRIMARY KEY. Any indexes will lower your insert speed, especially UNIQUE.
  • You should be using the lowest transaction isolation level that works for you.
  • Tune innodb_flush_method for your system.
  • If you don't require full ACID guarantees, look into changing the value for innodb_flush_logs_at_trx_commit
  • Use proper hardware, tune your OS/storage/file system

You will find good pointers here:

Share:
15,719

Related videos on Youtube

Raheel
Author by

Raheel

Updated on September 18, 2022

Comments

  • Raheel
    Raheel over 1 year

    I am writing a code to store data and want to store 1 Billion to 10 Billion records or even more. Most importantto this is speed (nearly 10000 - 20000 / records per second), SPACE usage and CRASH proofing. Now I have been playing with MySQLs MyISAM and it works pretty FAST and matches the speed required for us. MyISAM also has optimal SPACE usage and the database file in the MySQL data folder was nearly equal to the amount of data I inserted e.g. for 100000 records of 30 charaters each, it took only 3 MB. But the only problem is that it is not ACID and can cause data loss on crashes.

    So I wanted to know of any alternatives (I dont even mind changing the Database software to anything else) which can help us to store so much data and store them FAST.

    I have already tried MySQL INNODB and its inserts are very slow. I could only achieve 800 - 1000 / records per second. Also the amount of space to store data in INNODB is pretty high. It takes nearly 8 MB to store the same 100000 records of 30 chars each.

    I also tried MongoDB as an alternative, but it again requires TOO much space to store the same data.

    Even SQLITE3 DB is VERY VERY SLOW to write 100000 records. I tried this with PHP in a for loop. It also missed some 20 records and gave errors of database unavailable.

    So I wanted to know of any open source Database software which can meet my requirements. I heard that MariaDBs ARIA is ACID compliant. Is it true ? Or is PostgreSQL also a good fast, option with optimal space utilization.

    I am open to any solution which is FAST, utilizes optimal space and also is CRASH proof. Looking forward to replies.

    === UPDATE on MariaDB Aria === Its even bad than InnoDB when it comes to Insert speed and Space. It took 1.5 times the amount of time than InnoDB to insert the same data and nearly 2.5 times space used by MyISAM.

    • MadHatter
      MadHatter almost 12 years
      To me, this looks perilously close to a shopping question, which our FAQ opposes. To maximise the chance of getting a useful answer, you may wish to read blog.stackoverflow.com/2010/11/qa-is-hard-lets-go-shopping and consider revising your question.
    • MrGigu
      MrGigu almost 12 years
      Does it have to be a relational engine? There are some modern btrieve implementations that are fast.
    • Raheel
      Raheel almost 12 years
      @Mark, it doesnt have t be relational. I am open to any other solution.
    • Raheel
      Raheel almost 12 years
      @TomO'Connor the problem is that many servers want to dump data on one server. I have the option of only one server.
    • Boris Ivanov
      Boris Ivanov almost 12 years
      Well you your main goal is speed you should pay mo attention to NOSQL databases like Redis, MongoDb etc.
    • Raheel
      Raheel almost 12 years
      Do you mean MORE or NO. I didnt understand what do you mean by "mo"
    • David Schwartz
      David Schwartz almost 12 years
      You should definitely look into the Apache Project's Cassandra database. Twitter and Netflix use it.
    • gWaldo
      gWaldo almost 12 years
      You haven't addressed the hardware issue. Since speed is a primary concern for you, load your servers up with as much RAM as they can hold, and replace your disks with SSDs
    • Zac B
      Zac B over 11 years
      Crash-proof, fast, storage-efficient, and scalable, huh? I'd start with finding one of these: en.wikipedia.org/wiki/Holy_Grail
  • Raheel
    Raheel almost 12 years
    Prevayler is in JAVA (I cannot run Java) and MemSQL has licensing issues. I am trying MariaDB and PostgreSQL to see if that helps.
  • Janne Pikkarainen
    Janne Pikkarainen almost 12 years
    @mnmnc: The link you posted is from 2002 ...
  • Raheel
    Raheel almost 12 years
    I tried batching and it does speed up. I made a single INSERT query for 10 records and results were 10 times faster. But space is still the issue. Its taking 2-2.5 times the space MyISAM is taking.
  • mnmnc
    mnmnc almost 12 years
    :) You are correct. let me fix it..
  • pQd
    pQd almost 12 years
    @Raheel - try compression. although i think innodb will always take more space.
  • Raheel
    Raheel almost 12 years
    I am using linux ext3/4 (CentOS 5 / 6) and NTFS on windows 7 x64 as well. I have been researching on this for the past 2-3 days. And MariaDB does seem to be good. But I am not sure whether ARIA is the thing I require. I just need crash resilience and not all the ACID tests. There should be no loss of data when the server goes down. Also we do maintain UPS, good quality hardware, etc. But a crash can happen even despite these things.
  • Raheel
    Raheel almost 12 years
    Yes, compression does help but its still taking more space than it should.
  • pQd
    pQd almost 12 years
    and how much space SHOULD it take? maybe you dont need indexing and dumping data to the flat file is enough?
  • Raheel
    Raheel almost 12 years
    I dont need Indexing at all. Dumping to a FLAT file can cause data loss in case of a crash as well right ? I presume that it would be the same as MyISAM if I dump to a flat file ?
  • pQd
    pQd almost 12 years
    do you have ssd with supercapacitors or BBU flash? without that no solution will guarantee data loss. heck - one server? you WILL lose some data sooner or later. database safety means that you'll not be left with the inconsistent data. but you might lose last transaction[s] before the potential crash.
  • Raheel
    Raheel almost 12 years
    I will backup data from that server to another one. I have SATA drives. Can crashes even cause InnoDB to fail ?
  • symcbean
    symcbean almost 12 years
    From what you're saying here, aria looks even more appropriate. Do not use ext3 if write performance is important. Ext4 is IMHO more resillient than XFS, but probably not as fast with multiple concurrent writers.
  • symcbean
    symcbean almost 12 years
    I dont need Indexing at all - then why do you need a database? Indeed, why are you storing the data if you never intend to retrieve it? Dumping to a FLAT file can cause data loss in case of a crash as well right - saying this in this context makes me think you have a very different interpretation of data loss than is usually applied to the term.
  • pQd
    pQd almost 12 years
    @Raheel - yes, you can always encounter bugs in innodb or file system or kernel or hardware. if the data is so precious dont store it at one place. sooner or later you'll need to take the system down for maintenance or... just because ECC for your ram was not enough to handle multi-bit corruption.
  • Olavi Sau
    Olavi Sau almost 8 years
    What is RRAS in this context?
  • Chopper3
    Chopper3 almost 8 years
    This is really old - and they seem to call it RAS these days instead of RRAS but here you go intel.com/Assets/PDF/whitepaper/323479.pdf