Recently I needed to apply compression data on a particularly large table. One of the main reasons for applying compression was because the database was extremely low on space, in both the data and the log files. To make matters worse, the data and log files were nowhere near big enough to accommodate compressing the entire table in one go. If the able was partitioned then I could have done one partition at a time and all my problems would go away. No such luck.
So I had to think quite outside of the box to compressing the data. The solution I came up with was to create a new empty table, and copy 50 rows at a time (that was the largest I could move at any one time, yes I really was THAT low on space) and then delete the 50 rows copied. But instead of COPY/DELETE, which would seem the logical way, I ran the DELETE first and then run an INSERT from the output table. This way I would be certain that the same rows that were deleted were copied into the new table, and meant that I was only scanning the original table the once.
I had to run further checks into the script: the database was set to simple mode, but the log file would still grow because of ETL processes etc running on other parts of the db, and so I needed to check to make sure that I wasn’t saturating the log file with each delete. So the logic here will work for both SIMPLE, BULK and FULL recovery models.
I have provided the script below, with an example to use the AdventureWorks database. Though I ran this on a table that was over 300GB and it worked perfectly, and ran far faster than you would think!
One discrepancy is that in the example, I have created a temp and new table for the data loading so as not to delete any data in the Adventureworks database. In the real world there is not need to copy all the data into the “temp” table first, you would just delete from the original table. Hope that is clear for everyone………
Any questions, let me know.
USE AdventureWorks2012 GO SELECT TOP 0 * INTO Person.AddressNew FROM Person.Address SET IDENTITY_INSERT Person.AddressNew ON SELECT * INTO Person.AddressTemp FROM Person.Address SET NOCOUNT ON DECLARE @pct FLOAT DECLARE @rserror VARCHAR(1024) DECLARE @startTime DATETIME DECLARE @Rows INT DECLARE @DelayLength CHAR(12) = '00:00:00:100' SET @Rows = 1 WHILE (@Rows & >0) BEGIN SET @startTime = GETDATE() SELECT @pct = convert(DECIMAL(38, 8), round(100 * (sf.size - CAST(fileproperty(sf.NAME, 'SpaceUsed') AS FLOAT)) / 128.000, 8) / (sf.size / 128.000)) FROM dbo.sysfiles sf WHERE sf.fileid = 2 SELECT @rserror = (CONVERT(VARCHAR(24), @startTime, 121)) + ' : Txn free: ' + CAST(@pct AS VARCHAR) RAISERROR ( @rserror ,0 ,1 ) WITH NOWAIT WHILE @pct & <25 BEGIN RAISERROR ( 'Not enough txn log space' ,0 ,1 ) WITH NOWAIT; CHECKPOINT WAITFOR DELAY '00:05:00'; SELECT @pct = convert(DECIMAL(38, 8), round(100 * (sf.size - CAST(fileproperty(sf.NAME, 'SpaceUsed') AS FLOAT)) / 128.000, 8) / (sf.size / 128.000)) FROM dbo.sysfiles sf WHERE sf.fileid = 2 END DELETE TOP (50) FROM Person.AddressTemp WITH (TABLOCK) OUTPUT deleted.AddressID ,deleted.AddressLine1 ,deleted.AddressLine2 ,deleted.City ,deleted.StateProvinceID ,deleted.PostalCode ,deleted.SpatialLocation ,deleted.rowguid ,deleted.ModifiedDate INTO Person.AddressNew(AddressID, AddressLine1, AddressLine2, City, StateProvinceID, PostalCode, SpatialLocation, rowguid, ModifiedDate) SET @Rows = @@ROWCOUNT SELECT @rserror = CONVERT(VARCHAR(4), @Rows) + ' rows deleted' RAISERROR ( @rserror ,0 ,1 ) WITH NOWAIT; -- small delay to take it easy WAITFOR DELAY @DelayLength; END SET IDENTITY_INSERT Person.AddressNew OFF DROP TABLE Person.AddressNew DROP TABLE Person.AddressTemp SET NOCOUNT OFF