To improve SQL performance, why not just put lots of RAM rather than having faster hard disks?

19,031

Solution 1

Your analysis is fine -- to a point -- in that it absolutely will make things faster. You still have to account for a couple of other issues though:

  1. Not everyone can afford enough memory; when you have multiple terabytes of data, you have to put it on disk some time. If you don't have much data, anything is fast enough.

  2. Write performance for your database is still going to be constrained by the disks, so that you can keep the promise that the data was actually stored.

If you have a small data set, or don't need to persist it on disk, there is nothing wrong with your idea. Tools like VoltDB are working to reduce the overheads that older assumptions in RDBMS implementations made which constrain pure in-memory performance.

(As an aside, people telling you to use RAID-5 for database performance are probably not great folks to listen to on the subject, since it is almost never the best choice - it has good read performance, but bad write performance, and writes are almost always the production constraint - because you can throw RAM into caching to solve most read-side performance issues.)

Solution 2

Short version: consider the working set size. Long version: How big is your data? If it can fit in memory of a modern server, yes, you're absolutely right. Unfortunately, the biggest Xeon can address 2TB of RAM right now, and that's not that big of a dataset any more. If you can't buy machine big enough to house your entire working set in RAM, you're forced to solve problems with your brain, not your wallet.

Solution 3

If you want speed:

  • Increase RAM so at least frequently used indexes can entirely fit into RAM (for example, on a system I work on, 32GB RAM is plenty for a 350GB database, because indexes are what you need in RAM, not raw data)
  • Use RAID10 with any disks (faster disks are better)
  • Avoid RAID5
  • Split mdf, ldf and temp DB onto discrete spindle sets (example: tempdb on its own RAID1 set, ldf on its own RAID1 or RAID10 spindle set, mdf on a RAID 10 set with at least 4 total disks)

Follow those steps, and SQL Server will fly.

Then if you want, add more RAM... but do the above first, and you may well find you are done.

Solution 4

RAM is the new disk, disk is the new tape.

In http://www.tbray.org/ongoing/When/200x/2006/05/24/On-Grids . Note that was six years ago. Yes, we have database systems that try (and try hard) to keep the entire dataset in RAM and rather shard to multiple machines than to use the disk because disk is magnitudes slower anyways. You need to write out the dataset to disk but as in the motto above, that's more akin to a background backup task than an online operation. Durability is achieved through append only logs with these databases (I am thinking MongoDB and Redis but there are tons more).

Solution 5

This question is similar to a basic one that has led to a lot of research and development in database architectures over the past 5-10 years. Now that it is feasible to store an entire database in RAM for many use cases, the database needs to be designed around working in RAM, rather than simply applying older inherited architectures to RAM-based storage.

Just as many smaller and more special-purpose languages have been widely adopted in recent years, we are entering an era more special-purpose databases will be needed.

For some further reading on this topic, I recommend the academic paper The End of an Architectural Era (It’s Time for a Complete Rewrite). It's not a difficult read.

It's unclear if this question was specifically about SQL Server. The original poster should clarify this.

Daniel Pittman wrote:

If you have a small data set, or don't need to persist it on disk, there is nothing wrong >with your idea. Tools like VoltDB are working to reduce the overheads that older assumptions >in RDBMS implementations made which constrain pure in-memory performance.

Reducing the overheads from older assumptions in RDBMS implementations was exactly the design goal of VoltDB, but it does scale horizontally with no architectural limit on the data size, and it can persist to disk for full durability using snapshotting and command-logging.

Share:
19,031

Related videos on Youtube

user1034912
Author by

user1034912

Updated on September 18, 2022

Comments

  • user1034912
    user1034912 almost 2 years

    People keep telling me that in order to improve an SQL server's performance, buy the fastest hard disks possible with RAID 5, etc.

    So I was thinking, instead of spending all the money for RAID 5 and super-duper fast hard disks (which isn't cheap by the way), why not just get tonnes of RAM? We know that an SQL server loads the database into memory. Memory is wayyyy faster than any hard disks.

    Why not stuff in like 100 GB of RAM on a server? Then just use a regular SCSI hard disk with RAID 1. Wouldn't that be a lot cheaper and faster?

    • Admin
      Admin over 12 years
      Whoever is telling you RAID 5 doesn't have a clue. If you really care about performance, use RAID 10
    • Admin
      Admin over 12 years
      What's the D in ACID stand for? Eventually, you're gonna need to write stuff down.
  • Daniel Pittman
    Daniel Pittman over 12 years
    @user1034912 - varies on the use case, and users. Generally, write performance issues are harder to resolve, and end up placing greater constraints on overall system performance, which means that when you solve the read problem they start complaining about the write problem...
  • John Gardeniers
    John Gardeniers over 12 years
    @user1034912, users don't normally see write delays, so are unaware of them. Most of what users see as read delays are due to slow queries, not slow disks.
  • Alex
    Alex over 12 years
    An excellent answer! @user1034912 they might complain of read issues which could of course be a knock-on effect of poor write-performance (and poor-scaling concurrency code).
  • adaptr
    adaptr over 12 years
    Wow, did you copy that from your (misunderstanding of) your textbooks ?
  • Jonesome Reinstate Monica
    Jonesome Reinstate Monica over 12 years
    -1 because nice as this stuff is, it is not really accessible or appropriate for most apps or most of us here. For up to 500gb of data (or even more), all you need are two SQL Servers (primary and backup), and you have a really fast using normal tools for hundreds or thousands of users. Very few of us need to scale to hundreds of thousands of concurrent users or multiple data centers, so the complexity of your proposed approach far outweighs the benefit for most of us. IOW: Vertical scaling is easy, cheap, and effective for everyone who is not facebook or google.
  • gWaldo
    gWaldo over 12 years
    RAID5 in Relational Databases: en.wikipedia.org/wiki/… - I'm not saying that you're wrong, but the conventional wisdom may be based on old info. Personally, I don't use RAID5 anymore; I use RAID6 unless it's too slow.
  • pkoch
    pkoch over 12 years
    +1 for the last sentence being extremely quotable. :D
  • Daniel Pittman
    Daniel Pittman over 12 years
    @gWaldo - "its complicated" is very fair statement there. I generally stand by the position, but it is true that controller improvements and big battery backed caching controllers help hide the extra cost of a RAID-5 device. On the large number of systems where performance just isn't an issue at all, too, it totally doesn't matter.
  • Cromulent
    Cromulent over 12 years
    Ugh. How many times do people have to be told that RAID is not a backup solution?
  • Jonesome Reinstate Monica
    Jonesome Reinstate Monica over 12 years
    -1 Why are you talking about voltdb or how much memory costs? This is a sql server question...
  • Daniel Pittman
    Daniel Pittman over 12 years
    @samsmith - because the answer is complex, and that points in the direction of why the solution of "throw more memory at it" might not deliver the performance expected. In the same way that both our answers WRT RAID 10 vs 5 are not strictly SQL Server related. :)
  • Jonesome Reinstate Monica
    Jonesome Reinstate Monica over 12 years
    @DanielPittman - the answer isn't complex, so why make it complex? The answer is: Use normal techniques to get all you can out of SQL Server -- which will almost certainly suffice. If that doesn't work, then (and only then) evaluate alternatives to SQL Server.... which usually includes things like putting a cache next to SQL Server long before dropping SQL Server entirely...
  • Jeremy Davidson
    Jeremy Davidson over 10 years
    @user1034912, slow write can hold locks for a long time that stop reads from responding