Do Not Trust My GitHub Projects

Aloha!

Recently I’ve revisited one of my GitHub projects, TurboLogShip. I’ve been wanted to write a presentation around what it does and why it’s actually really important and super useful for people using log shipping in SQL Server. Essentially it speeds up log ship restoring to a readable secondary to the fastest it can be without upgrading hardware or software. And the idea of trying to do something for free when your hardware/software may not even be the problem is a pretty compelling reason to use it. Short of re-writing the log shipped database, it’s the most pragmatic and expedient way you can fix the problem of painfully slow log shipping.

I’ve approached a few user group leaders about me presenting on the topic of log shipping and how TurboLogShip can help. But funnily enough no one is really interested in me presenting about how I ’ve fixed a problem in a technology that is now at a venerable 12 years old and counting. People are excited about SQL Server on Linux and Azure, not transactional log shipping. It’s hardly the most in-depth or widely used technology anway.

“Do As I Say Not As I Do” Is a Lousy Motto

So rather than approach the subject as “TurboLogShip is great, and by extension I’m great, and also aren’t I smart for figuring this out”, I’m looking it more holistically as “TurboLogShip is on GitHub, this is what it does, and as a guy who talks about Database DevOps I want to talk about how I am practicing what I preach and am helping ensure that the project is quality.”

However, as late as last year, TurboLogShip was just a .sql script living on SQLServerCentral. Hardly a method to deliver something to run something as important as log shipped files… And even on my About page I claim that TurboLogShip is as complete as it’ll ever be. But it is, right? It does what it needs to do and no more.

Well, not quite. I’ve been saying this a lot really, but we all know that the only thing you ever have to do to deliver software is literally write the code. Sounds obvious. But what I mean by this is that things like proper planning and scoping requirements, source control, continuous integration, continuous delivery, and testing are completely optional. You don’t have to do any of things. Sure, your product will end up being pretty lousy, but these are the facts.

So, in terms of code complete, TurboLogShip is home and dry. But what about testing? I’m expecting someone to take this and run this (eventually) on Production databases, yet I’ve given no indication of what tests I’ve run against TurboLogShip. Plus, where are my builds to guarantee that the code at least compiles. And seriously,ever since I went through the whole debacle with Invoke-Sqlcmd [1] earlier this year, I’ve turned a skeptical eye to anything anyone has ever written. And but so why should I expect to treat my project with anything more than a high degree of skepticism?

So, with this in mind, I’ve put some effort into setting up source control, builds and tests, and created status checks when pull requests are created (more on this later). So I’m going to address each one of these below and what I’ve implemented, why, and what challenges it’s bought along. Consider this blog the rough notes for my pending presentation.

Source Control

This was an easy choice: GitHub. It’s completely open in that anyone who wants to join can sign up for free. It’s collaborative in that anyone who has passed the test of creating an account can clone, branch and raise a pull request. And if the name “GitHub” and the tasks listed haven’t already made it staggeringly obvious, GitHub is based on git, the de-facto choice of source control among developers in the 21st Century. I’m not going to sit here and wax lyrical about git, but to say the least it is popular because it is so flexible. I used to work as a TFS admin, it was my first job away from testing. So I’m forever grateful for Microsoft creating a tool that was seriously important but no one knew how to use. This gave me a niche to move into (and more importantly, out of testing.) And though TFS is still an option in VSTS, I think everyone has acknowledged that git has won the almighty source control wars (was there ever really a fight?)

On the project I have one branch which is protected, and also set up pull requests as a requirement: all work must be created in a local branch before merging into master can take place. These pull requests must be reviewed, and a few status checks met (I cover this below I promise.)

Builds/Continuous Integration

Back in October, when I released TurboLogShip, I decided to stick with what I know and create a SSDT solution to group the tvp and stored procedure together. So when it comes to builds, sadly Travis is not an option. Which means I’ve had to turn to VSTS and create a service endpoint between GitHub and VSTS to run the build. All I care about at this point is “does it compile?” At this point this is infinitely better than just having the source code up on GitHub. There’s a certain level of quality guaranteed here. The build is set up as a CI build, and one of the really neat features in VSTS is that it will compile pull requests, so you can have one of these as a status check when accepting a pull request. This may not sound like much, and I know I’m very rarely, if ever, going be the recipient of a pull request. But if I ever did, knowing that the change results in a green build is something.

One of the disadvantages about using VSTS is that the free projects have a 5 team member limit. So it’s not nearly as open as GitHub is. But realistically, I doubt anyone else will be interested enough to become an admin on this project and do enough to warrant access to the builds.

Status Checks

Status Checks are a fantastic addition to pull requests in GitHub. Status Checks must pass before a pull request can be merged. So in the case of my build compiling a pull request: were this to fail then I could set a pull request to be rejected. Neat.

I feel bad that very few words have been used to describe Status Checks considering how useful they are. But they do exactly what you need them to do (ensure a certain level of quality with the prerequisites) with very little effort.

Testing

So up to this point, I’ve got gold code, builds, and checks to ensure that at the very least the code compiles. But what I really need to focus on is the testing. I need to create some tests that can be run against any updated version of TurboLogShip.

To paraphrase something I read elsewhere: I like having tests, I love the idea of having tests, I hate having to write tests. Let’s also factor in that what I need here is a multiple databases set up with log shipping to actually test some positive and negative test scenarios, and my enthusiasm is all but spent. The idea of writing automated tests are right out. At this point, even one test is better than no test. So I have set up two VM’s in Azure, hacked away at them so that they can see each other, and set up log shipping across one database. Sadly I’ve yet to get any tests running as it took me an age to get the VM’s up and running.

One thing I also wanted to do was make the test list public. VSTS is great for tooling like this, but it’s not available in the free one, and there’s a list of 5 users so project. So rather than find another tool that’ll do what I want, I created a project in GitHub and have my tests sat there. I think at this point of the project, testing is the most important aspect and yet testing is the most poorly defined part.

Setting up the test servers and actually defining clear tests in my head has also brought up the age old problem of “how to test”: what I mean by this is that for code not written with testing in mind, the tests can be challenging to set up, difficult to implement, and even harder to verify that they have passed easily. So it looks like I’ll have to revisit the code to help the testing effort. I thought I had it cracked last night when I thought “the sproc runs in a database, just create a table and log there” when I remembered that this is supposed to be a composite database solution, so the sproc and tvp are shipped with the log-shipped database. All this means that the secondary, where the job will be run, will be in readonly mode….. bit difficult to write to a table on a readonly database. And no I don’t need a test to verify that one… Shame because writing to a table would have been a great way to troubleshoot any issues. So I need to find a way that will have zero impact on the code whilst making testing easier.

Monitoring

Chances are, most people have some sort of monitoring on their SQL Agent Jobs, and TurboLogShip is a wrapper around the SQL Agent jobs shipped with log shipping in SQL Server. So I have chickened out in writing anything that monitors the stored procedure.

However, as far as the delivery pipeline goes, I have integrated Slack to post when a build is completed (note both failed or succeeded statuses are monitored) and also when check-in is made. Far less irritating than email as I can mute when I wish, and if there’s any lost souls who want to help out with this project then they can join the group and follow the notifications if they wish.

Summary

I’m a big fan of free software: at the very least, you’ll never suffer from buyers remorse. But there’s always a risk in taking something someone is offering for free on GitHub, with no guarantee of support. And so as a custodian of several project on GitHub, it’s important to show that you have taken your project seriously in attempting to offer a quality solution.

In terms of hours of effort, it hasn’t taken me very long to create a decent pipeline that I hope helps assuage peoples skepticism in TurboLogShip actually working: To get from a sql script on a blog to a GitHub project with builds and alerting has taken me in total about 8 hours. To get the testing in a good place will take as much again. And it’s those tests that will really convince people that it actually works. But in terms of financial cost, it’s been free up to now. The Azure testing servers will make a minor dent in my pocket.But that’s not really an issue as I can switch them off when I don’t need them, and even when I do they only need a few gigs of RAM and 2 or so cores. It’s hardly big data stuff I’m doing here.

However, in terms of quality, I think there’s been an infinite improvement. And I know this works because the other night I was removing all the warnings that SSDT produces when it builds (3 part references to system databases) and I totally forgot to include the referenced dacpacs into the project. Locally it would and always would continue to build no issues. But once anyone else took the project compile, it would fail. So why would anyone ever use my project if it would never compile except on my machine?

What I will say about sharing the code on SQLServerCentral, it did give me the opportunity to write a pithy post about how it worked and got some quick feedback on it. This is important when working on your own, and also got some testing done in the process. But no way was it a realistic long terms solution in sharing your code with others..

In retrospect, it’s incredible how easily I could integrate Slack, VSTS and GitHub to do exactly what I wanted so easily, for free! I’ve still a way to go with testing, but it’s definitely heading in the right direction.

[1] I’m still not over this. I’ll tell anyone who has the vaguest of knowledge (note knowledge, not interest) in SQL and PowerShell this horror story of a car crash of a way of guaranteeing that I’ll never trust Invoke-Sqlcmd again.

bzzzt!

What's Going On?