SQL Server Compression 101

The data compression feature in SQL Server can help reduce the size of the database as well as improve the performance of intensive I/O intensive workloads, especially on data warehouses. This performance boost in I/O is offset against the extra CPU resource that is required to compress/decompress the data whilst data is exchanged with the application. With data warehouses, the guideline is to compress all objects as there is typically CPU capacity whilst the data storage and memory capacity is at a premium, so Microsoft’s recommendation is to compress all objects to the highest level. However, if there are no compressed objects at all, it is better to take the cautious approach and to evaluate each database object individually and the effects on the workload, particularly if the CPU headroom is limited.

SQL Server 2008 introduced two levels of compression: row level compression and page level compression. (note: SQL Server 2014 introduced two levels of compression: columnstore and columnstore_archive, but we will not be discussing these as I blogged about them before) Row compression applies variable length to fixed length data types (eg an int type may not be using all the space reserved for it on disk because it is a small number) amongst other storage saving techniques. Page compression applies row compression as well as implementing prefix and dictionary compression (ie looking for patterns in the data and rather than repeating those patterns it marks how many times a pattern is repeated.)

Compression is applied at an object level (table/index/indexed view) rather than at the database or instance level. Compression is an Enterprise-only feature (also available in Developer), and as such any databases that have compression applied can only be restored to other Enterprise and Developer based SQL Server Instances. You can check what features are already enabled in a database by querying the dm_db_persisted_sku_features table:

sku_features_png

SELECT SCHEMA_NAME(sys.objects.schema_id) AS [Schema]
,OBJECT_NAME(sys.objects.object_id) AS [Object]
,[data_compression_desc] AS [COmpressionType]
FROM sys.partitions
INNER JOIN sys.objects ON sys.partitions.object_id = sys.objects.object_id
WHERE data_compression &amp;gt; 0
AND SCHEMA_NAME(sys.objects.schema_id) &amp;lt;&amp;gt; 'SYS'
GROUP BY SCHEMA_NAME(sys.objects.schema_id)
,OBJECT_NAME(sys.objects.object_id)
,[data_compression_desc]
ORDER BY [Schema]
,[Object]

There’s the option to default data compression on objects, but this can be risky because if you go compressing all objects without considering your workload you’ll kill CPU performance because it’s compressing objects that may not even bring any space saving benefits.

It’s also important that, if your database is in source control (and if it isn’t why on earth not), objects that are compressed must explicitly be declared with compression, otherwise when a dacpac is compared against the databaase any inconsistencies in compression are taken into account. So this means that not only are uncompressed objects compressed, but compressed objects are uncompressed. This can take a very long time, and could cause disks to run out of space, causing deployments to fail.

bzzzt!

What's Going On?

SQL Server Compression 101