Find Largest Tables To Compress
In this post I’m gong to show some T-SQL that will show you the compression state of tables and how to quickly find objects that are worth compressing.
Last week brentozar.com ran one of their Technology Triage Tuesdays on Compression in SQL Server. The video does not appear to be up yet, however it provided good insight, certainly better than the Technet pages on the same subject. As we have a multi-terabyte data warehouse at work, which is on an Enterprise licensed instance of SQL Server 2012, I’m familiar with the subject of data compression in SQL. Until recently however, another one of our other SQL Servers was on a Standard license. Recently this instance was upgraded to Enterprise, and so I was able to compress the database. Although not as large as our data warehouse, it was well worth considering compressing some of the larger tables.
Below is a script to check for all objects that are currently compressed.
--script to identify compressed tables
SELECT st.name, st.object_id, ss.name AS [Schema Name], sp.partition_id, sp.partition_number, sp.data_compression,
sp.data_compression_desc FROM sys.partitions SP
INNER JOIN sys.tables ST ON
st.object_id = sp.object_id
INNER JOIN sys.schemas SS ON
SS.schema_id = ST.schema_id
WHERE data_compression <> 0
If you want to see which objects are not compressed, just change the ‘<>’ to ‘='
--script to identify uncompressed tables
SELECT st.name, st.object_id, ss.name AS [Schema Name], sp.partition_id, sp.partition_number, sp.data_compression,
sp.data_compression_desc FROM sys.partitions SP
INNER JOIN sys.tables ST ON
st.object_id = sp.object_id
INNER JOIN sys.schemas SS ON
SS.schema_id = ST.schema_id
WHERE data_compression = 0
But it’s still not a great way to find out how much space, or what the most appropriate compression is. Fortunately there’s two stored procs you can run here that will help you out. The names are fairly self explanatory so I won’t insult your intelligence by telling you what they do, other than in the 2nd one you can replace ‘PAGE’ with ‘ROW’.
EXEC sp_spaceused N'Value.Currency'
EXEC sp_estimate_data_compression_savings 'Value', 'Currency', NULL, NULL, 'PAGE' ;
Although useful, some databases can contain many tables. You may know your workload and the largest tables in your database, but it’s still a laborious process to go through them all, and some may save you more space than others. So I wrote a script that will go through and print out the commands to run the first stored proc for all tables on your database. I prefer the print out method because you can check the result, and you can batch the executes into as many or as little as you like; although not a taxing sp, I felt it worth checking first! Also if you do choose to batch the entire process, you can then export the results to Excel.
The script also has a bit flag that will switch the execution to the 2nd sp. Again, this uses the print out method, as this sp is considerably more taxing than the first, though still not as demanding as the actual act of compression.
Both of the printouts store the results in a temp table, and from there is possible to then query the data better (more of this below). Notice that there are some inline comments that will be printed out to drop the temp tables; don’t forget to run these when you are finished with them!
SET NOCOUNT OFF
declare
@schema varchar (50), @table varchar (50), @tempTable varchar (500), @i bit, @tab nvarchar (12), @sql nvarchar (max)
set @i = 0
if @i = 0
BEGIN
set @tempTable = 'CREATE TABLE #FOO (
[name] nvarchar (50)
,[rows] int
,[reserved] varchar (20)
,[data] varchar (20)
,[index_size] varchar (20)
,[unused] varchar (20)
)
GO
--SELECT * FROM #FOO
-- DROP TABLE #FOO
'
END
ELSE
set @tempTable = 'CREATE TABLE #BAR (
[object_name] nvarchar (100)
,[schema_name] nvarchar (100)
,[index_id] bigint
,[partition_number] bigint
,[size_with_current_compression_settings(KB)] bigint
,[size_with_requested_compression_settings(KB)] bigint
,[sample_size_with_current_compression_settings(KB)] bigint
,[sample_size_with_requested_compression_settings(KB)] bigint
)
GO
--SELECT * FROM #BAR
--DROP TABLE #BAR
'
PRINT @tempTable
DECLARE DB_cursor CURSOR FAST_FORWARD
FOR SELECT [name] from AdventureWorks.sys.tables
OPEN DB_cursor
FETCH NEXT FROM DB_cursor into @table
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @schema = MAX (ss.name)
FROM sys.partitions SP
INNER JOIN sys.tables ST ON
st.object_id = sp.object_id
INNER JOIN sys.schemas SS ON
SS.schema_id = ST.schema_id
WHERE data_compression = 0
and st.name = @table
SELECT @table = MAX (st.name)
FROM sys.partitions SP
INNER JOIN sys.tables ST ON
st.object_id = sp.object_id
INNER JOIN sys.schemas SS ON
SS.schema_id = ST.schema_id
WHERE data_compression = 0
and st.name = @table
--insert into @dbTableVar values (@schema, @table)
if @i = 0
BEGIN
set @tab = 'INSERT #FOO'
END
ELSE
set @tab = 'INSERT #BAR'
if @i = 0
BEGIN
set @sql = 'EXEC sp_spaceused N''' + @schema + '.' + @table + ''''
END
ELSE
set @sql = 'EXEC sp_estimate_data_compression_savings ''' + @schema+'''' +','''+ @table+ ''', ' + 'NULL, '+ 'NULL, '+ ''+'PAGE;'
print @tab
print @sql
FETCH NEXT FROM db_cursor
INTO @table
END
CLOSE db_cursor
DEALLOCATE db_cursor
SET NOCOUNT ON
Once you have executed the stored procs, you will have your temp tables of data, you can organize the data into finding the largest tables, like in the example below. It requires a bit of trimming and casting, but then you’re able to find which of your tables are worth considering for compression.
select name, cast( left([reserved], LEN ([reserved]) -2) as decimal) as size_KB from #foo order by size_KB desc
So let’s say for the sake of an argument that we’re only interested in compressing objects larger than 100,000KB, roughly 100MB ( if we want the database to be in memory completely, maybe we have to think this small). With the 2nd table, you can run a query like the one below where there is a high value in the compression of any table that is greater than or equal to the smallest table you are interested in compressing by dividing the size of the table with the current compression against the projected size of the table. The higher the value then clearly the higher the value of compression.
select [object_name],[schema_name], Case when cast([size_with_requested_compression_settings(KB)] as int) = 0 then NULL
else
cast ([size_with_current_compression_settings(KB)] as decimal)/ cast([size_with_requested_compression_settings(KB)] as int)
end
as [space_savings]
from #BAR
where [size_with_current_compression_settings(KB)] >= 106896
order by space_savings desc
With a few queries we have identified the objects that we would benefit the most from compressing. I’m going to assume that you’ll want to compress using PAGE compression as it includes all that ROW compression offers plus more. If you have a server where CPU is the bottleneck, you may not want to follow this strategy but try to see if there is a win from using just ROW compression. Compression is applied using the sp below. I was tempted to expand the script to print out the commands similar to above. However compressing tables requires a considerable amount of effort on the part of the server, and you probably don’t want to batch the process. Plus, you probably don’t have hundreds of tables to compress, and if you do, then you can simply mod the script provided above.
ALTER TABLE Value.Currency REBUILD WITH (DATA_COMPRESSION = PAGE);
As mentioned in the webinar, it’s probably not a good idea to compress larger objects with considerable space saving optimizations during a busy period. So if you want a db maintenance window, you can use that Excel spreadsheet I mentioned earlier to send to the boss to show how much space you can save in a pretty chart to justify the downtime :D