SQL Server Partitioning 101
I’ve worked with partitioning for quite some time now, and though it has been around for quite some time and lots has been written about the concept I recently presented an example of partitioning to a few people within a team and felt it was worth sharing on this blog. In today’s post I am going to demonstrate how to create partitions for a table, what happens when you insert data, how to split data out, how to switch partitions and finally merging partitions.
This is going to be demo heavy, and will discuss what is happening along with the demos. I’ve provided links at the bottom to other sources that may speak moretheory and less practical should you wish to read more on the subject.
First let’s create the database.
GO
CREATE DATABASE [Examples]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'Examples', FILENAME = N'H:\SQLData\Sylph\Examples.mdf' , SIZE = 524288KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
LOG ON
( NAME = N'Examples_log', FILENAME = N'H:\SQLData\Sylph\Examples_log.ldf' , SIZE = 524288KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
ALTER DATABASE [Examples] SET RECOVERY SIMPLE
GO
Nothing much to see here other than creating a common database with one file, the default primary filegroup and a log file.
Then let’s create a new filegroup which the partition schema will be stored on. When creating a partition schema, you can use the primary filegroup. There is nothing preventing us from using the primary filegroup, or specifying any number of filegroups to hold the partition ranges specified (if this isn’t totally clear read on and I will explain further down.) Really how you define where the data is going to be stored depends on the design of your database.
ALTER DATABASE [Examples] ADD FILEGROUP FGDexId
GO
Now we have the filegroup let’s create some files for it so that we can store data on the filegroup.
USE [master]
GO
ALTER DATABASE [Examples] ADD FILE ( NAME = N'DexId_F01', FILENAME = N'C:\SQLData\Examples_DexId_F01.ndf' , SIZE = 2GB , FILEGROWTH = 1GB ) TO FILEGROUP [FGDexId]
GO
ALTER DATABASE [Examples] ADD FILE ( NAME = N'DexId_F02', FILENAME = N'C:\SQLData\Examples_DexId_F02.ndf' , SIZE = 2GB , FILEGROWTH = 1GB ) TO FILEGROUP [FGDexId]
GO
Ok, let’s create the partition function and partition schema.
USE [Examples]
GO
CREATE PARTITION FUNCTION pfnDexId (int)
AS RANGE LEFT FOR VALUES (1,2,3,4,5,6,7,8,9,10)
GO
CREATE PARTITION SCHEME pscDexId
AS PARTITION pfnDexId ALL TO ([FGDexId])
GO
So now let’s delve into what’s going on here. Creating a partition function is the first step into partitioning a table. This maps the rows of a table into partitions based on the values of a specified column. Note we’re not specifying the table or the column here, just the value in which we will be partitioning by. We are calling the partition function pfnDexid, so as to give some idea as to which column (known as the partitioning column) we will be partitioning by (we have not created the table yet, that will come later.) The column will be of type INT. w/r/t to accepted data types for the partitioning column, there are invalid data types for this (from MSDN: text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types.)
The LEFT|RIGHT defines which side of the boundary the values are stored by the database. LEFT is the default. So if the boundary range values were 1, 100, 1000 the ranges would be:
![2014-04-16-15_47_56-book1-excel_png" alt=“2014-04-16 15_47_56-Book1 - Excel” width=“849” height=“84](/assets/g class=“aligncenter wp-image-2359 size-full” src=“http://phoenixultd.files.wordpress.com/2014/04/2014-04-16-15_47_56-book1-excel.png" alt=“2014-04-16 15_47_56-Book1 - Excel” width=“849” height=“84)
The second command is creating the partition schema. This maps the partitions of a partitioned table to filegroups for storage. In this example I am storing them ALL to the filegroup we created earlier, FGDexId. If you include the ALL, only one filegroup can be specified. So naturally this means we can store partitions across filegroups. So if you wanted to you can create more filegroups and include them in the TO statement. Partitions are assigned to the filegroups in the order which they are specified. You can specify a filegroup more than once, however if there is not enough filegroups to partitions specified the create schema statement will fail with an error.
In SQL Server 2012, the maximum number of partitions is 15,000. This was increased significantly from SQL Server 2008 where the maximum number was 1,000 partitions. I don’t want to create all 15,000 partitions now, but I want to create some more wityh little effort. To create more partitions, you have to split a range, that is creating a new boundary. As long as partitions are empty, this is a near instantaneous process (and it’s a head achingly long time if there is significant data to split… more on this later.) The SQL below wlll script out the SQL needed.
declare @partitionToAdd INT
declare @sql nvarchar (MAX)
set @partitionToAdd = 11
WHILE @partitionToAdd < 1000
BEGIN
SET @sql = N'ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId]; '
PRINT @sql
SET @sql = 'ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(' + CAST((@partitionToAdd) AS VARCHAR(50)) + ');'
PRINT @sql
set @partitionToAdd = @partitionToAdd + 1
END
--then take printout and run
You need to run the printout. It will look like this:
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(11);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(12);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(13);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(14);
So I’ve covered the split, but what is this NEXT USED? As stated earlier, for every partition created you define a filegroup to store the data on. Splitting is the same as creating a new partition as you define a new boundary, so you need to define which filegroup the NEXT partition created will be stored on, which is why you run the NEXT USED and then create/split the partition.
Execute the SQL below to see the files and their sizes. Everything will be nearly empty
use Examples
GO
select
[File Name] = left(sf.NAME,30),
[File Size in MB] = convert(decimal(12,2),round(sf.size/128.000,2)),
[Space Used in MB] = convert(decimal(12,2),round(fileproperty(sf.name,'SpaceUsed')/128.000,2)),
[Free Space in MB] = convert(decimal(12,2),round((sf.size-fileproperty(sf.name,'SpaceUsed'))/128.000,2)) ,
[Free Space in %] = convert(decimal(12,2),round(100*(sf.size- CAST(fileproperty(sf.name,'SpaceUsed') as float))/128.000,2)/(sf.size/128.000))
from dbo.sysfiles sf
order by fileid asc
(Throughout these examples we will be checking the size of files so that we can get an idea of where the data is being stored.)
So now we’ve got our partitions created and stored on a filegroup, and we know how it is created, but how can we monitor the partitions? BOL shows that there are some object catalogs that we can use such as sys.partitions, sys.partition_scheme and sys.partition_function, but there’s nothing out of the box we can use. Here’s two stored procs that can be used for this example, and they can most likely be tweaked to work with most partitioned tables. There are two stored procs because the second stored proc is very informative but can take a very long time to run if you have many partitioned tables across many partitions functions and schemas. The first stored proc runs very quick and will be suitable for most enquiries.
Here’s the first, fast sproc. It only requires that you pass in one variable, the name of the partition. In the case of our example it is “DexId” and it uses the name of a table that is partitioned (we’ve not that far to creating a table yet). It also only returns the top 100 partitions, so you may want to alter this.
USE [Examples]
GO
CREATE PROCEDURE [dbo].[LatestPartitionPreview]
(@partitionName nvarchar(100))
AS
declare @sql nvarchar(max)
declare @sampleTable sysname
declare @pscName nvarchar(100)
declare @pfnName nvarchar(100)
SET @pscName = 'psc' + @partitionName
SET @pfnName = 'pfn' + @partitionName
set @sampleTable = 'PartitionTable'
declare @partitonedTableId int
SELECT @partitonedTableId = t.object_id
FROM sys.TABLES t with (nolock)
JOIN sys.indexes i with (nolock) ON t.object_id = i.object_id
JOIN sys.partition_schemes ps with (nolock) ON i.data_space_id = ps.data_space_id
WHERE i.index_id < 2 and ps.name = @pscName
and t.name like @sampleTable + '%'
declare @currentPartitions table(partitionNumber int, [rows] int, value sql_variant, ranking int)
;with ranges (boundaryId, value)
as
(
select prv.boundary_id, prv.value
from sys.partition_functions f with (nolock)
inner join sys.partition_schemes s with (nolock) on s.function_id = f.function_id
inner join sys.partition_range_values prv with (nolock) on prv.function_id = f.function_id
where f.name = @pfnName
)
INSERT INTO @currentPartitions
SELECT partition_number, [rows], [value], ROW_NUMBER() OVER (ORDER BY partition_number desc) ranking FROM
(select top 1000 p.partition_number, [rows], [value]
FROM sys.partitions p with (nolock)
left join
ranges on ranges.boundaryid = p.partition_number
where p.[object_id] = @partitonedTableId
order by p.partition_number desc
) latestPartitions order by partition_number desc
select * from @currentPartitions
GO
This is the code for the second, more informative, but way slower sproc. The last column will only be correct for tables that ranges are created to the LEFT of the boundary’s. This sproc does not require any variables. You may want to alter this to suit your needs on a database with more than one partition function/schema, but for this demo I;ve hard coded some of the variables you’d want to change (hint; like @schema, @partitionedtableName, @partitionschema…)
--create more informative, but waaaaaaaay slower stored proc
USE [Examples]
GO
CREATE PROCEDURE [dbo].[PartitionView]
@Schema NVARCHAR(50) = 'dbo',
@partitionedTableName varchar(50) = 'PartitionTable'
AS
DECLARE @parts TABLE([object_id] INT, [schema] sysname, name sysname, schemeName sysname, partitionNumber int, fileGroupName sysname,
rangeValue int, [rows] int, boundaryOnRight bit UNIQUE CLUSTERED (name, partitionNumber) )
DECLARE @table_ids TABLE ([object_id] INT)
DECLARE @partitionScheme sysname
SET @partitionScheme = 'pscDexId'
INSERT INTO @table_ids
SELECT t.object_id as [objectid]
FROM sys.TABLES t
JOIN sys.indexes i ON t.object_id = i.object_id
JOIN sys.partition_schemes ps ON i.data_space_id = ps.data_space_id
JOIN sys.partition_functions pf ON ps.function_id = pf.function_id
WHERE i.index_id < 2 and ps.name = @partitionScheme
and (@partitionedTableName is null or t.name like @partitionedTableName + '%')
;with Parts([object_id], [schema], name, schemeName, partitionNumber, fileGroupName, rangeValue, rows, boundaryOnRight) AS
(
SELECT I.[object_id],
S.name AS [schema],
OBJECT_NAME(I.[object_id]) AS name,
PS.name AS schemeName,
DDS.destination_id AS partitionNumber,
FG.Name AS fileGroupName,
cast(PRV.value as int) AS rangeValue,
P.rows,
PF.boundary_value_on_right AS boundaryOnRight
FROM sys.indexes I with (nolock)
INNER JOIN sys.objects O with (nolock) ON O.object_id = I.object_id
INNER JOIN sys.schemas S with (nolock) ON O.schema_id = S.schema_id
INNER JOIN sys.partition_schemes PS with (nolock) ON PS.data_space_id = I.data_space_id
INNER JOIN sys.partition_functions PF with (nolock) ON PF.function_id = PS.function_id
INNER JOIN sys.destination_data_spaces DDS with (nolock) ON DDS.partition_scheme_id = PS.data_space_id
INNER JOIN sys.filegroups FG with (nolock) ON FG.data_space_id = DDS.data_space_id
INNER JOIN sys.partitions P with (nolock) ON P.object_id = i.object_id AND P.partition_number = DDS.destination_id
LEFT OUTER JOIN sys.partition_range_values PRV with (nolock) ON PRV.boundary_id = DDS.destination_id AND PS.function_id = PRV.function_id
WHERE i.[object_id] in (select [object_id] from @table_ids)
)
insert into @parts
select * from parts
SELECT P1.object_id, p1.[schema], p1.name, p1.schemeName ,p1.partitionNumber,p1.fileGroupName, p1.rangeValue, p1.rows, p1.boundaryOnRight,
CASE WHEN P2.rangeValue IS NULL THEN '' ELSE CASE P1.boundaryOnRight WHEN 0 THEN '>' ELSE '>=' END END +
COALESCE(CAST(P2.rangeValue AS VARCHAR) + CASE WHEN P1.rangeValue IS NULL THEN '' ELSE ' AND ' END, '') +
CASE WHEN P1.rangeValue IS NULL THEN '' ELSE CASE P1.boundaryOnRight WHEN 0 THEN '<=' ELSE '<' END END +
COALESCE(CAST(P1.rangeValue AS VARCHAR), '') AS [expr]
FROM @Parts P1
LEFT JOIN @Parts P2 ON P1.partitionNumber = P2.partitionNumber + 1
AND P1.schemeName = P2.schemeName AND P1.name = P2.name
order by p1.partitionNumber desc
GO
In the Examples below I will call these sprocs several times like so, so when you see this sql you must have created the sprocs above:
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
If you were to execute those sprocs now you’d see that there’s nothing to report:
this is because this is correct. There are no partitioned objects stored on the partitions. Let’s start creating some objects and populating the database with data.
USE [Examples]
GO
CREATE SCHEMA [App] AUTHORIZATION [dbo]
GO
USE [Examples]
GO
CREATE TABLE [App].[Dex](
[DexId] [int] NOT NULL,
[DexName] [nvarchar](50) NULL,
[discoveredDate] [datetime] NOT NULL,
CONSTRAINT [PK_DexId] PRIMARY KEY CLUSTERED
(
[DexId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
declare @i as int;
set @i = 1;
begin tran
while @i <= 1004
begin
INSERT INTO Examples.[App].[Dex]
(
[DexId] ,
[DexName] ,
[discoveredDate]
)
values
( @i, CAST (NEWID () AS NVARCHAR(MAX)), CAST(DATEADD(MINUTE,@i,getdate()) AS nvarchar(30)))
set @i = @i + 1;
end;
commit;
This table is stored on the primary filegroup, so we will expect to see the file fill up a little bit after adding the 1,000 rows.
Now let’s create a table on the partitions.
USE [Examples]
GO
CREATE SCHEMA [Bootstrap] AUTHORIZATION [dbo]
GO
USE [Examples]
GO
CREATE TABLE [Bootstrap].[PartitionTable](
[DexId] [int] NOT NULL,
[path] [int] NOT NULL,
[timestep] [int] NOT NULL,
[BootStrapName] varchar (4),
[Bootstrap] [float] NOT NULL,
CONSTRAINT [PK_PartitionTable] PRIMARY KEY CLUSTERED
(
[DexId] ASC,
[path] ASC,
[timestep] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE)
ON PscDexId (DexId)
)
ALTER TABLE [Bootstrap].[PartitionTable] WITH NOCHECK ADD CONSTRAINT [FK_PartitionTable_DexId] FOREIGN KEY([DexId])
REFERENCES [App].[Dex] ([DexId])
GO
ALTER TABLE [Bootstrap].[PartitionTable] CHECK CONSTRAINT [FK_PartitionTable_DexId]
GO
I created a clustered index as standard, and created a constraint. A constraint will have no impact on partitioning, but the same cannot be said for non clustered indexes, or even clustered indexes/non clustered indexes on views that reference the partitioned column. This topic, called partition alignment, is a topic for another post. If you’re the impatient type and want to know more now I recommend this post on SQLFool. However for this post I wanted to demonstrate that foreign key constraints will not impact partitioning tables.
Unlike in the previous example, where we stored the table on the primary filegroup, we are storing this table on the partition schema “PscDexId” and we are determining which column to partition by as “(DexId)”. So without even adding any data, if we run the two sprocs we will see the table partitioned.
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
The top sproc gives us all the info we’d need to know on a partition schema; the name of the table, the partition scheme name, the number of the partition, which filegroup the partition is stored on (particularly useful if you are using more than one filegroup) it’s range value and also the expression that defines which partition the data will be stored in. This is important to appreciate; you define the boundaries, not the db engine, and you need to manage them, not the db engine. If you are not careful with your boundaries, you will store more data in a partition than you intend to, and other than your own checks there is nothing to prevent a switch from dropping more data than you intend to. Keep this in mind as we go on.
The bottom sproc is an abridged version, however you can infer the boundaries by the value from the previous partition.
Run the sql below a few times to populate a few thousand rows into the table
use Examples
go
declare @i as int;
set @i = 1;
begin tran
while @i <= 1000
begin
if @i in ( select DexId from app.Dex group by DexId)
begin
insert into [Bootstrap].[PartitionTable]
(
[DexId]
,[path]
,[timestep]
,[BootStrapName]
,[Bootstrap]
)
values
( @i, RAND ()*120, RAND ()*100 , 'FFOS', RAND()*250)
end
set @i = @i + 1;
end;
commit;
Running the stored procs show us that the data has been split into it’s relative partitions: the data was split by the DexId into the relative partitions. this is all fairly straightforward; we defined the boundaries when we created the partition schema, we defined which column to partition by when we created the table, and with 1000 partitions and the maximum number being 1,000 this all fits into the concept of partitioning very nicely.
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
Truncate the table for the next example.
truncate table Examples.Bootstrap.PartitionTable
use Examples
go
declare @i as int;
set @i = 1;
begin tran
while @i <= 1004
begin
if @i in ( select DexId from app.Dex group by DexId)
begin
insert into [Bootstrap].[PartitionTable]
(
[DexId]
,[path]
,[timestep]
,[BootStrapName]
,[Bootstrap]
)
values
( @i, RAND ()*120, RAND ()*100 , 'FFOS', RAND()*250)
end
set @i = @i + 1;
end;
commit;
What doing here is inserting some more data into the table, still with the constraint enforced (remember we created 1004 rows in the app.dex table). This time, only run it the once. When we run it inserts the data no issues. But let’s run the stored procs again:
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
Oh dear. The top partition now has 5 rows worth whereas all the other partitions have the correct number of rows (1). Because we did not have partitions created for the DexId boundary, our expression shows us that anything with a value of greater than 999 will go into that top partition. When this occurs we can either leave it (bad), delete it (not great) or split. You can split the top partition on boundary values even if the values with in the top partition will exceed the new boundary. The db engine will shift the data around. in our example this will take no time at all. But in the real world, where partitioning is used in data warehouses and may well have millions, or even billions of rows per partition, this will be a very time consuming task, with much locking taking place. This is why it’s really important to make sure that partitions are created before entering the data.
If you do find yourself in a situation where you have many runs within one partition and you want to split it out, you need to consider which direction you split from depending on whether you store the data to the left or the right. This might sound confusing, but logically it works like this; if you store the values to the left of the boundary, then when you split you want to split from the bottom range upwards. That way it will only move that one range out of the current partition into the new partition. And the opposite applies when you are storing the data to the right: splitting from the top to the bottom means that you are moving only the top range out of the current partition. And if each run is roughly the same size, then the time taken to split the range will be roughly the same.
Let’s take a little digression here to illustrate this point. (Run this example if you want to, but inserting this many rows like this can cause a few issues). I inserted 25,000,000 into the top partition with ranges that, to retain the integrity of our partitions, have to be split out.
use Examples
go
declare @i as int;
set @i = 1;
begin tran
while @i <= 5000000
begin
insert into [Bootstrap].[PartitionTable]
(
[DexId]
,[path]
,[timestep]
,[BootStrapName]
,[Bootstrap]
)
values
( 1000, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 1001, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 1002, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 1003, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 1004, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000)
set @i = @i + 1;
end;
commit;
I ran the stored proc midway through the inserting of data to verify show that all the data was being stored in the top partition. I then ran the split from the top to bottom. If you are following at home, run one at a time. This will take minutes to run for each partition split, so do not be alarmed if it takes a while.
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1004);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1003);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1002);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1001);
The results are below:
With each split of the partition the number of reads was reduced and it got faster. Now truncate the table and re-insert the data and this time run the splits from bottom to top:
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1001);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1002);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1003);
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() SPLIT RANGE(1004);
All the splits took the exact same amount of splits because we were only moving the partition out. Not only that, but the logical read was twice as fast as the fastest split from top to bottom. This means you really need to know which way you split partitions if they are mistakenly merged.
So now, if you run the file size query and the stored procs, the files that the partitioned table is stored on has grown and that our partitions are separated out as designed.
use Examples
GO
select
[File Name] = left(sf.NAME,30),
[File Size in MB] = convert(decimal(12,2),round(sf.size/128.000,2)),
[Space Used in MB] = convert(decimal(12,2),round(fileproperty(sf.name,'SpaceUsed')/128.000,2)),
[Free Space in MB] = convert(decimal(12,2),round((sf.size-fileproperty(sf.name,'SpaceUsed'))/128.000,2)) ,
[Free Space in %] = convert(decimal(12,2),round(100*(sf.size- CAST(fileproperty(sf.name,'SpaceUsed') as float))/128.000,2)/(sf.size/128.000))
from dbo.sysfiles sf
order by fileid asc
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
One of the big advantages of partitioning a table is that you can switch partitions in and out of a table instantly. So you can delete data out of a table by switching the partition out to a temporary table and dropping the table. First create an empty resultset that is a direct copy of the table we want to switch out of.
use Examples
select top 0 * into Bootstrap.PartitionTableDrop from Bootstrap.PartitionTable
However, this empty table is on the primary filegroup, plus we need to rebuild the table with page compression and have the exact same clustered index to accept the switch. If the table is not on the same filegroup or matching compression then the switch will fail.
alter table Bootstrap.PartitionTableDrop rebuild with (data_compression = page)
As the resultset only creates the columns and not the index, we can add the clustered index and target the filegroup that the partitioned table is stored on. Note, this is not a partitioned table, nor does it need to be on the partition schema, it just needs to be on the same filegroup.
ALTER TABLE [Bootstrap].[PartitionTableDrop] ADD CONSTRAINT [PK_PartitionTableDrop] PRIMARY KEY CLUSTERED
(
DexId] ASC,
[path] ASC,
[timestep] ASC
)
on FGDexId
Now the table is ready to switch into.
ALTER TABLE [Bootstrap].[PartitionTable]SWITCH PARTITION $PARTITION.pfnDexId(1004) TO Bootstrap.PartitionTableDrop
select count (*) from Bootstrap.PartitionTableDrop
Now that temporary table has the entire partition switched into it.
use Examples
GO
select
[File Name] = left(sf.NAME,30),
[File Size in MB] = convert(decimal(12,2),round(sf.size/128.000,2)),
[Space Used in MB] = convert(decimal(12,2),round(fileproperty(sf.name,'SpaceUsed')/128.000,2)),
[Free Space in MB] = convert(decimal(12,2),round((sf.size-fileproperty(sf.name,'SpaceUsed'))/128.000,2)) ,
[Free Space in %] = convert(decimal(12,2),round(100*(sf.size- CAST(fileproperty(sf.name,'SpaceUsed') as float))/128.000,2)/(sf.size/128.000))
from dbo.sysfiles sf
order by fileid asc
And if you check the partitions using the sprocs you’ll see that the partition is now empty.
USE [Examples]
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
You’ll also see that the filegroup has not yet regained any space yet, but once you drop the temporary table we will reclaim the space in the files.
drop table Bootstrap.PartitionTableDrop
use Examples
GO
select
[File Name] = left(sf.NAME,30),
[File Size in MB] = convert(decimal(12,2),round(sf.size/128.000,2)),
[Space Used in MB] = convert(decimal(12,2),round(fileproperty(sf.name,'SpaceUsed')/128.000,2)),
[Free Space in MB] = convert(decimal(12,2),round((sf.size-fileproperty(sf.name,'SpaceUsed'))/128.000,2)) ,
[Free Space in %] = convert(decimal(12,2),round(100*(sf.size- CAST(fileproperty(sf.name,'SpaceUsed') as float))/128.000,2)/(sf.size/128.000))
from dbo.sysfiles sf
order by fileid asc
Switching and deleting the table does not get rid of the partition, the partition continues to stay in the ranges available. If we want to remove the partition from the ranges we have to merge. Like splitting, it’s important to merge correctly to prevent data moving around. Run the script below to truncate the two tables and insert data back into the tables.
USE [Examples]
GO
ALTER TABLE [Bootstrap].[PartitionTable] DROP CONSTRAINT [FK_PartitionTable_DexId]
GO
truncate table app.dex
ALTER TABLE [Bootstrap].[PartitionTable] WITH NOCHECK ADD CONSTRAINT [FK_PartitionTable_DexId] FOREIGN KEY([DexId])
REFERENCES [App].[Dex] ([DexId])
ALTER TABLE [Bootstrap].[PartitionTable] CHECK CONSTRAINT [FK_PartitionTable_DexId]
GO
truncate table [Bootstrap].[PartitionTable]
declare @i as int;
set @i = 1;
begin tran
while @i <= 1004
begin
if (@i % 2) <> 0 and (@i % 3) <> 0
begin
INSERT INTO Examples.[App].[Dex]
(
[DexId] ,
[DexName] ,
[discoveredDate]
)
values
(@i, CAST (NEWID () AS NVARCHAR(MAX)), CAST(DATEADD(MINUTE,@i,getdate()) AS nvarchar(30)))
end
set @i = @i + 1;
end;
commit;
select * from app.dex
use Examples
go
declare @i as int;
set @i = 1;
begin tran
while @i <= 1004
begin
if @i in ( select DexId from app.Dex group by DexId)
begin
insert into [Bootstrap].[PartitionTable]
(
[DexId]
,[path]
,[timestep]
,[BootStrapName]
,[Bootstrap]
)
values
( @i, RAND ()*120, RAND ()*100 , 'FFOS', RAND()*250)
end
set @i = @i + 1;
end;
commit;
use Examples
go
declare @i as int;
set @i = 1;
begin tran
while @i <= 1000000
begin
insert into [Bootstrap].[PartitionTable]
(
[DexId]
,[path]
,[timestep]
,[BootStrapName]
,[Bootstrap]
)
values
( 13, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 17, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 19, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 23, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000),
( 25, RAND ()*125000, RAND ()*100000 , 'FFOS', RAND()*200000)
set @i = @i + 1;
end;
commit;
The script above will enter data into partitions whose number leaves a remainder if divided by 2 and 3.
EXEC [dbo].[PartitionView]
EXEC [dbo].[LatestPartitionPreview] @partitionName = N'DexId'
With regards to merging, if we merge the empty partitions into the partitions with data we will have to move data. If we merge partitions with data into partitions that are empty we will have to move data. This is because sql drops the partition and merges the range values into the partition above. To ensure that no data movement occurs, the safest thing to do is to merge only those empty partitions into other empty partitions. Some of the partitions have large amounts of data in them, try merging and splitting with this data to see the time it takes.
ALTER PARTITION SCHEME pscDexId NEXT USED [FGDexId];
ALTER PARTITION FUNCTION pfnDexId() MERGE RANGE(993);
Summary
I named this post “Partitioning 101”, however partitioning itself is quite an advanced topic, and requires some reading and re-reading and testing in a development environment to ensure that what you think you are doing is the actual outcome. It can be a costly mistake, in terms of data lost or time wasted incorrectly merging/splitting/switch and dropping partitions if you do it wrong.
There’s quite a bit of resource on the Brent Ozar site for partitioning. I’ve not read all of it myself but if it;s anything like hte rest of hte content on that site it’ll be worth reading.
http://www.brentozar.com/sql/table-partitioning-resources/
Read MSDN for creating partitions schemas.
http://technet.microsoft.com/en-us/library/ms179854.aspx
and creating partition functions:
http://technet.microsoft.com/en-us/library/ms187802.aspx
and for altering partition functions:
http://technet.microsoft.com/en-us/library/ms186307.aspx