(Update: Part 2 focuses on clearing out the SSISDB by creating new stored proc based on the one used by the maintenance job. If your Maintenance Job is taking hours to run, you need to check this out.)
No WiX Wednesday this week, owing to commitments in real life. Instead, here is something regarding SSIS 2012 Deployment. Enjoy!
As part of our CI and Test Builds we have automated the deployment of two SSIS Projects. One is fairly large and the other one contains only two dtsx packages. Recently we have been getting timeout issues with the deployment of the solutions.
xmltimeout expired. the timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream,
BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean&amp; dataReady)
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream,
Boolean async, Int32 timeout, Task&amp; task, Boolean asyncWrite)<br /> at System.Data.SqlClient.SqlCommand.RunExecuteReader
(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource1 completion,
Int32 timeout, Task& task, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout,
2 projects/solutions compiled[/code]
(We don’t use the SSIS Deployment Wizard (even if you do use the Wizard in silent mode or not, you will still hit this issue so keep reading…) as it does not work in Silent Mode when you have assembly references in the SSIS packages, it destroys them for custom tasks. We use a custom build task written in C#. The source of this version can be found on CodePlex. In case you missed it, the reason we have to run in Silent Mode is because it is part of our automated build. )
As we had recently added the 2nd, smaller project to the build we thought that perhaps the build was trying to deploy both projects at the same time. So we commented out one of the deployments. Still the build failed. So we put both projects back in. Sometimes the build passed, and sometimes it failed. I then felt that as it was only occurring on our CI Builds, and that the SSIS packages are run as scheduled jobs it was possible that long running jobs were blocking the deployment. As we turn off the jobs before we deploy manually in our Staging and Production environments I then added a step in the build to turn off the SQL Agent for the SQL Instance. It was pretty brute force, but if it was any job that was running that blocked the deployment, then stopping the Agent would stop any job from running. However, despite stopping SQL Agent the SSIS projects failed to deploy.
Seeing as deploying an .ispac is essentially loading it into SSISDB I fired up sql profiler and left it running filtered only to display messages on SSISDB where the login account was the build server account. When the build failed a few times I checked to see the last cmd, and both times it was trying to run a stored proc:
I ran a quick Google search on the sproc exec [internal].[sync_parameter_versions] and came up with this site. Although the deployment described here was with the Wizard the error still matched up. So I created the indexes and hoped for the best. Still the build failed intermittently I also acquired the query plan to check to see if there were any other indexes that I could apply, and sadly there was not. But looking at what the sproc actually does, which is update the product ID to the latest version, I wondered if it was to do with the number of versions of the project that we keep. As this can only be controlled at the base level for all projects, and not the projects individually, it was plausible that the number of versions of each project was causing the timeout.
To get to the properties, connect to SQL, expand out the Integration Services Catalog, and right click on your catalog.
From here you can see that the default for “Maximum Number of Versions Kept per Project” is 10. And that now we had added an extra project we had effectively doubled our retention policy. Seeing as this was our CI environment, if someone broke something, and that we keep good source control, we’d either fix and deploy or rollback and deploy. Either way, that retention policy is a waste on our CI and Test Environments. But also whilst I was here, I thought that I’d reduce the retention policy for those logs. A years worth is just a waste, and for a test env I’d rather keep two days of logs than a years worth of basic logging and then ramp up the verbosity when the jobs actually failed. So I made these changes.
But in order for the SSISDB to be cleared out, there is a job that is run periodically. This job is created on creation of the SSISDB catalog and is called SSIS Server Maintenance Job. I decided to check if this had been run. Strangely, despite being on a schedule, there was no history for it! Totally confused, I created my own schedule to run twice a day as the catalog will continue to retain old versions until this job is run, and disabled the old one. By this point I had taken the POV that this was test, not prod, and if the issue had occurred in Prod I would not have been so dismissive. I manually ran the job which cleared out the database using the retention policy and kicked off a build, crossing all my fingers and toes. Mercifully, the build passed.
A few lessons I learnt here:
- alter the retention policy to a reasonable level (relative to the env)
- if you have good source control and change control, that number can be lower as you have your version history already.
- check that the cleanup job actually runs, and maybe get it to run twice a day if your CI builds are frequent
- alter the logging duration and verbosity to a sensible level.
- creating indexes does not solve all the problems you encounter
- sql profiler is your friend