SQL Server 2008 Partitioned Table and Parallelism

11,163

Solution 1

Partitioning can increase performance--I have seen it many times. The reason partitioning was developed was and is performance, especially for inserts. Here is an example from the real world:

I have multiple tables on a SAN with one big ole honking disk as far as we can tell. The SAN administrators insist that the SAN knows all so will not optimize the distribution of data. How can a partition possibly help? Fact: it did and does.

We partitioned multiple tables using the same scheme (FileID%200) with 200 partitions ALL on primary. What use would that be if the only reason to have a partitioning scheme is for "swapping"? None, but the purpose of partitioning is performance. You see, each of those partitions has its own paging scheme. I can write data to all of them at once and there is no possibility of a deadlock. The pages cannot be locked because each writing process has an unique ID that equates to a partition. 200 partitions increased performance 2000x (fact) and deadlocks dropped from 7500 per hour to 3-4 per day. This for the simple reason that page lock escalation always occurs with large amounts of data and a high volume OLTP system and page locks are what cause deadlocks. Partitioning, even on the same volume and file group, places the partitioned data on different pages and lock escalation has no effect since processes are not attempting to access the same pages.

THe benefit is there, but not as great, for selecting data. But typically the partitioning scheme would be developed with the purpose of the DB in mind. I am betting Remus developed his scheme with incremental loading (such as daily loads) rather than transactional processing in mind. Now if one were frequently selecting rows with locking (read committed) then deadlocks could result if processes attempted to access the same page simultaneously.

But Remus is right--in your example I see no benefit, in fact there may be some overhead cost in finding the rows across different partitions.

Solution 2

Partitioning is never an option for improving performance. The best you can hope for is to have on-par performance with non-partitioned table. Usually you get a regression that increases with the number of partitions. For performance you need indexes, not partitions. Partitions are for data management operations: ETL, archival etc. Some claim that partition elimination is possible performance gain, but for anything partition elimination can give placing the leading index key on the same column as the partitioning column will give much better results.

Will all partitions be queried?

That query needs an index on State. Otherwise is a table scan, and will scan the entire table. A table scan over a partitioned table is always slower than a scan over the same size non-partitioned table. The index itself can be aligned on the same partition scheme, but the leading key must be State.

Will all 8 CPUs be used to execute the query?

Parallelism has nothing to do with partitioning, despite the common misconception of the contrary. Both partitioned and non-partitioned range scans can be use a parallel operator, it will be the Query Optimizer decision.

Will performance be better than querying a table that is not partitioned?

No

How would Partition Index help?

An index will help. If the index has to be aligned, then it must be partitioned. A non-partitioned index will be faster than a partitioned one, but the index alignment requirement for switch-in/switch-out operations cannot be circumvented.

If you're looking at partitioning, it should be because you need to do fast switch-in switch-out operations to delete old data past retention policy period or something similar. For performance, you need to look at indexes, not at partitioning.

Solution 3

the very first question i have is if your table has a clustered index on it. if not, you'll want one.

Also, you'll want a covering index for your queries. Covering Indexes

If you have a lot of historical data you might look into an archiving process to help speed up your oltp applications.

Share:
11,163
Mike Barlow - BarDev
Author by

Mike Barlow - BarDev

Updated on June 08, 2022

Comments

  • Mike Barlow - BarDev
    Mike Barlow - BarDev almost 2 years

    My company is moving to SQL Server 2008 R2. We have a table with tons of archive data. Majority of the queries that uses this table employs DateTime value in the where statement. For example:

    Query 1

    SELECT COUNT(*) 
    FROM TableA 
    WHERE 
         CreatedDate > '1/5/2010' 
         and CreatedDate < '6/20/2010'  
    

    I'm making the assumption that partitions are created on CreatedDate and each partition is spread out across multiple drives, we have 8 CPUs, and there are 500 million records in the database that are evenly spread out across the dates from 1/1/2008 to 2/24/2011 (38 partitions). This data could also be portioned in to quarters of a year or other time durations, but lets keep the assumptions to months.

    In this case I would believe that the 8 CPU's would be utilized, and only the 6 partitions would be queried for dates between 1/5/2010 and 6/20/2010.

    Now what if I ran the following query and my assumptions are the same as above.

    Query 2

    SELECT COUNT(*) 
    FROM TableA 
    WHERE State = 'Colorado'
    

    Questions?
    1. Will all partitions be queried? Yes
    2. Will all 8 CPUs be used to execute the query? Yes
    3. Will performance be better than querying a table that is not partitoned? Yes
    4. Is there anything else I'm missing?
    5. How would Partition Index help?

    I answer the first 3 questions above, base on my limited knowledge of SQL Server 2008 Partitioned Table & Parallelism. But if my answers are incorrect, can you provide feedback any why I'm incorrect.

    Resource:

    BarDev

  • RichardTheKiwi
    RichardTheKiwi about 13 years
    Partitioning can help performance, because you can locate partitions on separate disks. This allows I/O bandwidth to increase = better performance. Other than that there is a lot of "opinion" in there which I don't agree with unless backed up with facts. msdn.microsoft.com/en-us/library/ms177411.aspx
  • Remus Rusanu
    Remus Rusanu about 13 years
    non-partitioned tables can be located on filegroups containing multiple files (spread across LUNs/disks) which will leverage the increased IO bandwidth better than partitioning.
  • Mike Barlow - BarDev
    Mike Barlow - BarDev about 13 years
    I totally aggree with Richard . Our database is on a san, and multiple LUNs are available. Remus, did you watch the video that was in the orginal post? It all about better performance with Table Partition.
  • Mike Barlow - BarDev
    Mike Barlow - BarDev about 13 years
    Everything I've read, and talking with Microsoft consultants mentions using Partion Tables and spread those partions across multiple disks or LUNs.
  • Remus Rusanu
    Remus Rusanu about 13 years
    Partitioning has its role and is irreplaceable for specific scenarios. However, that role is not performance.
  • Remus Rusanu
    Remus Rusanu about 13 years
    @BarDev: in the video in your post Eric Hanson talks about improvements made to queries that touch multiple partitions. The improvement he mentions will improve performance when compared with another query that touch multiple partitions in SQL Server 2005. This by no means implies that the query performs better than a query that runs on a single partitioned table. The message of that video is by no means 'partitioning improves performance', is 'we made improvements to partitioning so that now is not as bad as it was'.
  • Dave Clausen
    Dave Clausen over 10 years
    Be careful in using the term "never". Partitioning can help performance not only because it can involve multiple disks and queries can better-leverage CPU cores, but also because it facilitates locking at the partition-level rather than the entire table-level.