Recently I needed to write a stored procedure that would update a row if it existed in a queue table, and if it did not then to insert the row. Thinking from a pseudo code thought process, you’d run a “IF EXISTS (SELECT Id FROM TABLE) UPDATE, ELSE INSERT INTO TABLE..” Whilst this approach does work, it would be inefficient. This will do a table/index scan twice: once for the SELECT statement and the UPDATE statement.
Depending on your version of SQL Server, there are more efficient ways to do this:
- In 2005, the efficient approach would be to attempt an UPDATE first, and if the @@rowcount returned from the previous statement is 0, then INSERT. This reduces the number of scans, however for inserts it is still two IO operations. As I was developing on a 2005 box, this is the approach I went for.
- In 2008 onwards you can reduce the INSERT to one operation by using the MERGE keyword.
Examples of both of these is below: lets create a queue table called quote and a Lookup table for the statuses so that the queue table does not store large amounts of text. We’ll create a clustered index on quoteId and populate the table with 50000 rows. Finally, we’ll add some lookups.
USE [slmnc] GO CREATE TABLE [dbo].[quoteStatus]( [Id] [tinyint] IDENTITY(1,1) NOT NULL, [QuoteStatusDescription] [varchar](24) NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[Quote]( [QuoteId] [int] NULL, [QuoteStatusId] [tinyint] NULL, [ErrorDescription] [nvarchar](300) NULL ) ON [PRIMARY] GO CREATE CLUSTERED INDEX IX_Quote_QuoteId ON [dbo].[Quote](QuoteId); GO declare @i as int; set @i = 1; begin tran while @i <= 50000 begin insert into dbo.quote values ( RAND ()*12, ABS(Checksum(NewID()) % 2) + 1, NEWID()) set @i = @i + 1; end; commit; insert into dbo.quotestatus values ('Complete'),('NotFound'),('Error')
Now we have out base tables, we’ll create a stored procedure using the pre-2008 method of UPSERT-ing data.
use slmnc; GO CREATE PROCEDURE [dbo].[SetQuoteStatus_2005] ( @QuoteID INT ,@QuoteStatus VARCHAR(24) ,@ErrorDescription VARCHAR(300) = NULL ) AS DECLARE @QuoteStatusId TINYINT SELECT @QuoteStatusId = ID FROM dbo.quoteStatus qs WHERE qs.QuoteStatusDescription = @QuoteStatus SET NOCOUNT ON UPDATE dbo.quote SET QuoteStatusId = @QuoteStatusId ,ErrorDescription = @ErrorDescription WHERE QuoteId = @QuoteID IF @@ROWCOUNT = 0 INSERT INTO dbo.Quote VALUES ( @QuoteID ,@QuoteStatusId ,@ErrorDescription ) GO
If we were to run this on values we knew had to be updated, and values we knew had to be inserted, we’d see an extra IO operation, just by getting the execution plan.
Using the merge keyword, we can reduce the number of inserts to one irrespective of whether we are inserting or updating. The updated stored procedure is below.
use slmnc; GO CREATE PROCEDURE [dbo].[SetQuoteStatus_2008] ( @QuoteID INT ,@QuoteStatus VARCHAR(24) ,@ErrorDescription VARCHAR(300) = NULL ) AS DECLARE @QuoteStatusId TINYINT SELECT @QuoteStatusId = ID FROM dbo.quoteStatus qs WHERE qs.QuoteStatusDescription = @QuoteStatus SET NOCOUNT ON merge dbo.quote as target using (values (@QuoteId, @QuoteStatusId,@ErrorDescription)) as source (field1, field2, field3) on target.QuoteId = @QuoteId when matched then update set QuoteId = source.field1, QuoteStatusId = source.field2, ErrorDescription = source.field3 when not matched then insert ( QuoteId, QuoteStatusId, ErrorDescription) values ( @QuoteId, source.field2, source.field3); GO
As you can see, the query plans are somewhat different from the 2005 stored procedure, but are pretty much identical to one another:
This second approach can cause race conditions amongst other strangeness. Irrespective of which approach you go for, I recommend reading through the linked post, as it looks under the hood of SQL.