Database design: one huge table or separate tables?

20,426

Solution 1

Start out with one large table, and then apply 2008's table partitioning capabilities where appropriate, if performance becomes an issue.

Solution 2

Datawarehouses are supposed to be big (the clue is in the name). Twenty million rows is about medium by warehousing standards, although six hundred million can be considered large.

The thing to bear in mind is that such large tables have a different physics, like black holes. So tuning them takes a different set of techniques. The other thing is, users of a datawarehouse must understand that they are dealing with huge amounts of data, and so they must not expect sub-second response (or indeed sub-minute) for every query.

Partitioning can be useful, especially if you have clear demarcations such as, as in your case, CUSTOMER. You have to be aware that partitioning can degrade the performance of queries which cut across the grain of the partitioning key. So it is not a silver bullet.

Solution 3

Splitting tables for performance reasons is called sharding. Also, a database schema can be more or less normalized. A normalized schema has separate tables with relations between them, and data is not duplicated.

Solution 4

I am assuming you have your database properly normalized. It shouldn't be a problem to deal with the data volume you refer to on a single table in SQL Server; what I think you need to do is review your indexes.

Solution 5

Since you've tagged your question as 'datawarehouse' as well I assume you know some things about the subject. Depending on your goals you could go for a star-schema (a multidemensional model with a fact and dimensiontables). Store all fastchanging data in 1 table (per subject) and the slowchaning data in another dimension/'snowflake' tables.

An other option is the DataVault method by Dan Lindstedt. Which is a bit more complex but provides you with full flexibility.

http://danlinstedt.com/category/datavault/

Share:
20,426

Related videos on Youtube

littlegreen
Author by

littlegreen

Updated on July 09, 2022

Comments

  • littlegreen
    littlegreen almost 2 years

    Currently I am designing a database for use in our company. We are using SQL Server 2008. The database will hold data gathered from several customers. The goal of the database is to acquire aggregate benchmark numbers over several customers.

    Recently, I have become worried with the fact that one table in particular will be getting very big. Each customer has approximately 20.000.000 rows of data, and there will soon be 30 customers in the database (if not more). A lot of queries will be done on this table. I am already noticing performance issues and users being temporarily locked out.

    My question, will we be able to handle this table in the future, or is it better to split this table up into smaller tables for each customer?


    Update: It has now been about half a year since we first created the tables. Following the advices below, I created a handful of huge tables. Since then, I have been experimenting with indexes and decided on a clustered index on the first two columns (Hospital code and Department code) on which we would have partitioned the table had we had Enterprise Edition. This setup worked fine until recently, as Galwegian predicted, performance issues are springing up. Rebuilding an index takes ages, users lock each other out, queries frequently take longer than they should, and for most queries it pays off to first copy the relevant part of the data into a temp table, create indices on the temp table and run the query. This is not how it should be. Therefore, we are considering to buy Enterprise Edition for use of partitioned tables. If the purchase cannot go through I plan to use a workaround to accomplish partitioning in Standard Edition.

    • Neil Knight
      Neil Knight about 14 years
      For your locks, are you specifying the NOLOCK query hint on your SELECT statements?
    • littlegreen
      littlegreen about 14 years
      Not yet, but now I will. Thanks.
    • littlegreen
      littlegreen about 14 years
      On second thought, I probably won't, considering some info that I found on the subject, and the discussion below.
    • Channa
      Channa about 4 years
      table partitioning will help. Also, rebuilding indexes won't take much time because now indexes will be specific to each partition now
  • littlegreen
    littlegreen about 14 years
    I have my data normalized, however the table I am referring to is completely denormalized, since it will be queried a lot and will not often change.
  • Otávio Décio
    Otávio Décio about 14 years
    If you are not updating the table then I wonder why you are having users being locked out.
  • Martin Smith
    Martin Smith about 14 years
    Dirty Reads Yes - It won't affect Phantoms though as these occur under the default isolation level as well.
  • littlegreen
    littlegreen about 14 years
    Probably because we are still in a design phase where we are bulk loading data into the database quite often. But I get your point, the locking problem will disappear in a production situation. Thanks!
  • littlegreen
    littlegreen about 14 years
    hehe.. i wish i knew even more about datawarehousing. you aren't by any chance looking for a job, are you :)
  • littlegreen
    littlegreen about 14 years
    If i have to give points to someone... this answer is concise, and the table partitioning hint led me to a lot of specific SQL server 2008 info that i can use. So thanks Galwegian, and everyone at that!
  • littlegreen
    littlegreen about 14 years
    I am not using a framework, I am using indexes, and we do have a kickass server. However, it is true that I am a newbie at the subject, and we are looking for a professional DBA to add to the team. I am also not yet using Profiler, so thanks for that tip.
  • Jé Queue
    Jé Queue over 13 years
    Is my nomenclature off? I call splitting tables partitioning. I call sharding the physical or seperation of data sets for particular purposes, no?
  • StockB
    StockB over 10 years
    What do you mean by black holes?
  • Joe C.
    Joe C. about 10 years
    @StockB: What he means is that large databases are an entirely different sort of thing than normal databases, much like black holes (in astrophysics) are an entirely different sort of thing than normal objects. They're so different that the "usual" rules we're used to when dealing with them simply don't apply. They have their own set of rules and assumptions you have to work with.