SQL select from table - only including data in specific filegroup

sql sql-server sql-server-2008 tsql database-partitioning

10,328

Solution 1

To get at a specific filegroup, you'll always want to utilize partition elimination in your predicates to ensure minimal records get read. This is very important if you are to get any benefits from partitioning.

For archival, I think you're looking for how to split and merge ranges. You should always keep the first and last partitions empty, but this should give you an idea of how to use partitions for archiving. FYI, moving data from 1 filegroup to another is very resource intensive. Additionally, results will be slightly different if you use a range right pf. Since you are doing partitioning, hopefully you've read up on best practices.

DO NOT RUN ON PRODUCTION. THIS IS ONLY AN EXAMPLE TO LEARN FROM.

This example assumes you have 4 filegroups (FG1,FG2,FG3, & [PRIMARY]) defined.

IF EXISTS(SELECT NULL FROM sys.tables WHERE name = 'PartitionTest')
    DROP TABLE PartitionTest;
IF EXISTS(SELECT NULL FROM sys.partition_schemes WHERE name = 'PS')
    DROP PARTITION SCHEME PS;
IF EXISTS(SELECT NULL FROM sys.partition_functions WHERE name = 'PF')
    DROP PARTITION FUNCTION PF;
CREATE PARTITION FUNCTION PF (datetime) AS RANGE LEFT FOR VALUES ('2012-02-05', '2012-05-10','2013-01-01');
CREATE PARTITION SCHEME PS AS PARTITION PF TO (FG1,FG2,FG3,[PRIMARY]);
CREATE TABLE PartitionTest( Id int identity(1,1), DT datetime) ON PS(DT);

INSERT PartitionTest (DT)
    SELECT '2012-02-05' --FG1
    UNION ALL
    SELECT '2012-02-06' --FG2(This is the one 90 days old to archive into FG1)
    UNION ALL
    SELECT '2012-02-07' --FG2
    UNION ALL
    SELECT '2012-05-05' --FG2 (This represents a record entered recently)

Check the filegroup associated with each record:

SELECT O.name TableName, fg.name FileGroup, ps.name PartitionScheme,pf.name PartitionFunction, ISNULL(prv.value,'Undefined') RangeValue,p.rows
FROM sys.objects O
INNER JOIN sys.partitions p on P.object_id = O.object_id
INNER JOIN sys.indexes i on p.object_id = i.object_id and p.index_id = i.index_id
INNER JOIN sys.data_spaces ds on i.data_space_id = ds.data_space_id
INNER JOIN sys.partition_schemes ps on ds.data_space_id = ps.data_space_id
INNER JOIN sys.partition_functions pf on ps.function_id = pf.function_id
LEFT OUTER JOIN sys.partition_range_values prv on prv.function_id = ps.function_id and p.partition_number = prv.boundary_id
INNER JOIN sys.allocation_units au on p.hobt_id = au.container_id
INNER JOIN sys.filegroups fg ON au.data_space_id = fg.data_space_id
WHERE o.name = 'PartitionTest' AND i.type IN (0,1) --Remove nonclustereds. 0 for heap, 1 for BTree
ORDER BY O.name, fg.name, prv.value

This proves that 2012-02-05 is in FG1 while the rest are in FG2.

In order to archive, your' first instinct is to move the data. When partitioning though, you actually have to slide the partition function range value.

Now let's move 2012-02-06 (90 days or older in your case) into FG1:

--Move 2012-02-06 from FG2 to FG1
ALTER PARTITION SCHEME PS NEXT USED FG1;
ALTER PARTITION FUNCTION PF() SPLIT RANGE ('2012-02-06');

Rerun the filegroup query to verify that 2012-02-06 got moved into FG1.

Solution 2

$PARTITION (Transact-SQL) should have what you want to do.

Run the following to know the size of your partitions and ID:

USE AdventureWorks2012;
GO
SELECT $PARTITION.TransactionRangePF1(TransactionDate) AS Partition, 
COUNT(*) AS [COUNT] FROM Production.TransactionHistory 
GROUP BY $PARTITION.TransactionRangePF1(TransactionDate)
ORDER BY Partition ;
GO

and the following should give you data from given partition id:

SELECT * FROM Production.TransactionHistory
WHERE $PARTITION.TransactionRangePF1(TransactionDate) = 5 ;

10,328

Author by

Hugo Forte

Just another c#/js/rails/node developer;-)

Updated on June 25, 2022

Comments

Hugo Forte almost 2 years
I followed this article: http://www.mssqltips.com/sqlservertip/1796/creating-a-table-with-horizontal-partitioning-in-sql-server/ Which in essence does the following:
1. Creates a database with three filegroups, call them A, B, and C
2. Creates a partition scheme, mapping to the three filegroups
3. Creates table - SalesArchival, using the partition scheme
4. Inserts a few rows into the table, split over the filegroups.
I'd like to perform a query like this (excuse my pseudo-code)
```
select * from SalesArchival
where data in filegroup('A')
```
Is there a way of doing this, or if not, how do I go about it.

What I want to accomplish is to have a batch run every day that moves data older than 90 days to a different file group, and perform my front end queries only on the 'current' file group.
Hugo Forte about 12 years

Thanks=) Any ideas as to how I would go about solving the problem of archiving data into a filegroup?
usr about 12 years

Regarding moving data: can't you move the partition to a different file group by rebuilding it there? Why are you doing partitioning anyway?
Hugo Forte about 12 years

Thanks a lot - I ran it on production and it wiped all the data! Just kidding, this was really helpful. One quick question though, what's the reason behind keeping the first and last partitions empty?
brian about 12 years

Keeping them empty helps because splitting (add new) and merging (consolidate) empty partitions is a meta data only operation. If they are not empty, the operations will increase the tlog size as you are physically inserting and deleting data. That way you can add a new partition to either end without consuming any resources or waiting for the transactions to complete.