Hello!

Recently I needed to apply compression data on a particularly large table. One of the main reasons for applying compression was because the database was extremely low on space, in both the data and the log files. To make matters worse, the data and log files were nowhere near big enough to accommodate compressing the entire table in one go. If the able was partitioned then I could have done one partition at a time and all my problems would go away. No such luck.

So I had to think quite outside of the box to compressing the data. The solution I came up with was to create a new empty table, and copy 50 rows at a time (that was the largest I could move at any one time, yes I really was THAT low on space) and then delete the 50 rows copied. But instead of COPY/DELETE, which would seem the logical way, I ran the DELETE first and then run an INSERT from the output table. This way I would be certain that the same rows that were deleted were copied into the new table, and meant that I was only scanning the original table the once.

I had to run further checks into the script: the database was set to simple mode, but the log file would still grow because of ETL processes etc running on other parts of the db, and so I needed to check to make sure that I wasn’t saturating the log file with each delete. So the logic here will work for both SIMPLE, BULK and FULL recovery models.

I have provided the script below, with an example to use the AdventureWorks database. Though I ran this on a table that was over 300GB and it worked perfectly, and ran far faster than you would think!

One discrepancy is that in the example, I have created a temp and new table for the data loading so as not to delete any data in the Adventureworks database. In the real world there is not need to copy all the data into the “temp” table first, you would just delete from the original table. Hope that is clear for everyone………

Any questions, let me know.

USE AdventureWorks2012
GO

SELECT TOP 0 *
INTO Person.AddressNew
FROM Person.Address

SET IDENTITY_INSERT Person.AddressNew ON

SELECT *
INTO Person.AddressTemp
FROM Person.Address

SET NOCOUNT ON

DECLARE @pct FLOAT
DECLARE @rserror VARCHAR(1024)
DECLARE @startTime DATETIME
DECLARE @Rows INT
DECLARE @DelayLength CHAR(12) = '00:00:00:100'

SET @Rows = 1

WHILE (@Rows & >0)
BEGIN
	SET @startTime = GETDATE()

	SELECT @pct = convert(DECIMAL(38, 8), round(100 * (sf.size - CAST(fileproperty(sf.NAME, 'SpaceUsed') AS FLOAT)) / 128.000, 8) / (sf.size / 128.000))
	FROM dbo.sysfiles sf
	WHERE sf.fileid = 2

	SELECT @rserror = (CONVERT(VARCHAR(24), @startTime, 121)) + ' : Txn free: ' + CAST(@pct AS VARCHAR)

	RAISERROR (
			@rserror
			,0
			,1
			)
	WITH NOWAIT

	WHILE @pct & <25
	BEGIN
		RAISERROR (
				'Not enough txn log space'
				,0
				,1
				)
		WITH NOWAIT;

		CHECKPOINT

		WAITFOR DELAY '00:05:00';

		SELECT @pct = convert(DECIMAL(38, 8), round(100 * (sf.size - CAST(fileproperty(sf.NAME, 'SpaceUsed') AS FLOAT)) / 128.000, 8) / (sf.size / 128.000))
		FROM dbo.sysfiles sf
		WHERE sf.fileid = 2
	END

	DELETE TOP (50)
	FROM Person.AddressTemp WITH (TABLOCK)
	OUTPUT deleted.AddressID
		,deleted.AddressLine1
		,deleted.AddressLine2
		,deleted.City
		,deleted.StateProvinceID
		,deleted.PostalCode
		,deleted.SpatialLocation
		,deleted.rowguid
		,deleted.ModifiedDate
	INTO Person.AddressNew(AddressID, AddressLine1, AddressLine2, City, StateProvinceID, PostalCode, SpatialLocation, rowguid, ModifiedDate)

	SET @Rows = @@ROWCOUNT

	SELECT @rserror = CONVERT(VARCHAR(4), @Rows) + ' rows deleted'

	RAISERROR (
			@rserror
			,0
			,1
			)
	WITH NOWAIT;

	-- small delay to take it easy
	WAITFOR DELAY @DelayLength;
END

SET IDENTITY_INSERT Person.AddressNew OFF

DROP TABLE Person.AddressNew

DROP TABLE Person.AddressTemp

SET NOCOUNT OFF

Happy scripting!