An old post from 2012 I’ve moved from old blog so that I can link to it in a new post I’m writing here. This is a little over-complicated I know…
As part of our CI and Test Builds we have automated the deployment of two SSIS Projects. One is fairly large and the other one contains only two dtsx packages. Recently we have been getting timeout issues with the deployment of the solutions.
(We use a custom build task written in C# to deploy. )
As we had recently added the 2nd, smaller project to the build we thought that perhaps the build was trying to deploy both projects at the same time. So we commented out one of the deployments. Still the build failed. So we put both projects back in. Sometimes the build passed, and sometimes it failed. I then felt that as it was only occurring on our CI Builds, and that the SSIS packages are run as scheduled jobs it was possible that long running jobs were blocking the deployment. As we turn off the jobs before we deploy manually in our Staging and Production environments I then added a step in the build to turn off the SQL Agent for the SQL Instance. It was pretty brute force, but if it was any job that was running that blocked the deployment, then stopping the Agent would stop any job from running. However, despite stopping SQL Agent the SSIS projects failed to deploy.
Seeing as deploying an .ispac is essentially loading it into SSISDB I fired up sql profiler and left it running filtered only to display messages on SSISDB where the login account was the build server account. When the build failed a few times I checked to see the last cmd, and both times it was trying to run a stored proc called “exec [internal].[sync_parameter_versions]”
I ran a quick Google search on the sproc exec [internal].[sync_parameter_versions] and came up with this site. Although the deployment described here was with the Wizard the error still matched up. So I created the indexes and hoped for the best. Still the build failed intermittently I also acquired the query plan to check to see if there were any other indexes that I could apply, and sadly there was not. But looking at what the sproc actually does, which is update the product ID to the latest version, I wondered if it was to do with the number of versions of the project that we keep. As this can only be controlled at the base level for all projects, and not the projects individually, it was plausible that the number of versions of each project was causing the timeout.
To get to the properties, connect to SQL, expand out the Integration Services Catalog, and right click on your catalog.
From here you can see that the default for “Maximum Number of Versions Kept per Project” is 10. And that now we had added an extra project we had effectively doubled our retention policy. Seeing as this was our CI environment, if someone broke something, and that we keep good source control, we’d either fix and deploy or rollback and deploy. Either way, that retention policy is a waste on our CI and Test Environments. But also whilst I was here, I thought that I’d reduce the retention policy for those logs. A years worth is just a waste, and for a test env I’d rather keep two days of logs than a years worth of basic logging and then ramp up the verbosity when the jobs actually failed. So I made these changes.
But in order for the SSISDB to be cleared out, there is a job that is run periodically. This job is created on creation of the SSISDB catalog and is called SSIS Server Maintenance Job. I decided to check if this had been run. Strangely, despite being on a schedule, there was no history for it! Totally confused, I created my own schedule to run twice a day as the catalog will continue to retain old versions until this job is run, and disabled the old one. By this point I had taken the POV that this was test, not prod, and if the issue had occurred in Prod I would not have been so dismissive. I manually ran the job which cleared out the database using the retention policy and kicked off a build, crossing all my fingers and toes. Mercifully, the build passed.
A few lessons I learnt here:
- alter the retention policy to a reasonable level (relative to the env)
- if you have good source control and change control, that number can be lower as you have your version history already.
- check that the cleanup job actually runs, and maybe get it to run twice a day if your CI builds are frequent
- alter the logging duration and verbosity to a sensible level.
- creating indexes does not solve all the problems you encounter
- sql profiler is your friend
I’m sure glad that we got to the bottom of this, and I guess that as this is the first release of SSISDB there are going to be teething problems and I hope that Microsoft do make some fixes like being able to deploy in silent mode and not have the referenced assemblies destroyed for custom tasks, and being able to alter the retention policy based on a catalog, and not the whole Integration Services Catalog level.