On Software Testing, By A Failed Software Tester

Hello!

krookodile

It is my birthday, and so I’ve decided to celebrate it by treating myself and talk about software testing. I’ve spoken about testing before, but I’ve decided to revisit the subject as I give it a lot of thought and effort in my role as a consultant brought in to talk about “doing the DevOps”.

Testing is definitely one of those things that people want to do more of. And yet conversely it is one of the things that people do least of, or put the least amount of effort into doing. This is of course not unique to testing; we all know that exercise is good for you, even if we don’t fully appreciate all it’s benefits. And we’re all aware that ultra-processed foods are bad for you. Yet we all order food via Deliveroo and binge-watch the latest streaming sensation shows. So it is not unreasonable to say that the reasons for not adopting testing/healthy living are largely the same: we don’t need to change to continue doing what we do. OK, let me rephrase that: just like we can continue to live by eating junk food and not moving off the sofa, we can choose not to write tests and continue to ship code. And this is true: change is hard, it takes effort, and the results are not immediately apparent. In fact things may get worse before they get better, because mistakes will be made along the way.

Lower Your Expectations

So if testing can make a positive difference, should we make testing mandatory for all the projects in your team/organisation/open source project from this point on? Taking this fundamental and hardline approach will cause massive self-harm in the long run to any effort in getting people to write more tests. If we recognise that most people want to do better, but don’t always know how, and are probably overwhelmed at how to go about testing their software, then we cna help them get there with encouragement in trying to do better rather than punishing them for not knowing how. To continue the healthy living analogy, if we want to get fitter, we might follow a plan like Couch to 5K. And it is not until the final week of a 9 week course do we attempt to run the full 5k in one session. What we don’t do is sign ourselves up for an Iron Man Triathalon on the first day of living healthy, because we will definitely fail.

One Good Test Is An Infinite Improvement over Zero Tests

Likewise, if we want to get better at writing tests, we start small and build up our competency. For example, Python has both unittest and pytest testing frameworks. Choosing a framework to write tests is one of the first things you have to do, so getting consensus on which one developers prefer might be one way to get developers to writing tests. And even then, if there is no consensus on which one is preferred, as long as a choice is made and tests are written then you’re on to a good thing.

Zero Tests Is Better Than An Infinite Amount of Flaky Tests

So if a framework is chosen that people are happy with then it is vital that the results are taken seriously. And nothing undermines testing quicker than flaky tests. If a test proves to be flaky, you’ve got two choices

comment it out
make it more robust

The first option is nearly always chosen, and with good reason. Making flaky test more robust takes time to understand what is wrong and how to resolve, and delivers no value to the code being tested, unless it is the code underneath that is flaky. Then you’ve got to address it.

If however you find yourself with many flaky tests being commented out then maybe you need to change your approach to testing.

Mocking Makes a Mockery Of Testing

One such approach is using mocks. Mocking means you can make assumptions about interfaces that the code under test accesses. This means that rather than having to set up an environment with everything that I need I can merely fake out things that would otherwise be fiddly to set up. Fort example, the test below removes the need for me to have a group called “FakeGroup” set up in an Azure Active Directory tenant with a value of ‘9acd586e-688d-41f6-9dfb-d593941884a3’ because I am substituting the execution of Get-AzADGroup in the function Get-FatCachedAdGroupId. This is a tremendous win in some respects as I don’t even need an Azure Subscription to run this test.

Describe "Get-FatCachedAdGroupId" -Tag 'Unit' {
    Context 'Get' {
        It "Mock Group Exists" {
            Mock Get-AzADGroup {
                Return @{Id = '9acd586e-688d-41f6-9dfb-d593941884a3' }
            }
            $groupName = Get-FatCachedAdGroupId -DisplayName 'FakeGroup'
            $groupName.Id | Should -BeExactly "9acd586e-688d-41f6-9dfb-d593941884a3"
            Assert-MockCalled Get-AzADGroup -Exactly 1
        }
    }
}

Mocking can be useful because we can (should) assert the number of times a mock is called, so that we can anticipate the behaviour expected based on an input. Below I create a csv of 5 rows but based on the inputs I am only creating 4 distinct values for ADGroup, so counting the number of times the mocked function is called verifies the behaviour, which is where the value is in this test.

        It "Group Exists" {
            $csvPath = Join-Path $PSScriptRoot csvs/dummy.csv

            $csvEntries = @(
                [pscustomobject]@{ Container = 'lake'; Folder = 'output'; ADGroup = 'adlsRoot'; ADGroupID = '80024941-9710-47d2-8be9-f06f4389620f'; DefaultPermission = 'r-x'; AccessPermission = 'rwx'; Recurse = 'False' }
                [pscustomobject]@{ Container = 'lake'; Folder = 'output'; ADGroup = 'adlsOutput'; ADGroupID = '16050cad-cf12-4c2d-9ba8-57a7553184a5'; DefaultPermission = 'r-x'; AccessPermission = 'rwx'; Recurse = 'False' }
                [pscustomobject]@{ Container = 'lake'; Folder = 'output2/process'; ADGroup = 'adlsProcess'; ADGroupID = 'b8243406-018c-4129-9fcb-f965e916d835'; DefaultPermission = 'r-x'; AccessPermission = 'rwx'; Recurse = 'False' }
                [pscustomobject]@{ Container = 'lake'; Folder = 'output2/process2'; ADGroup = 'adlsProcess'; ADGroupID = 'b8243406-018c-4129-9fcb-f965e916d835'; DefaultPermission = 'r-x'; AccessPermission = 'rwx'; Recurse = 'False' }
                [pscustomobject]@{ Container = 'lake'; Folder = 'raw'; ADGroup = 'adlsRaw'; ADGroupID = '5b6fd483-9acc-4978-9b0f-352eebf234a7'; DefaultPermission = 'r-x'; AccessPermission = 'rwx'; Recurse = 'False' }
            )
            $csvEntries | Export-Csv -Path $csvpath -UseQuotes Never
            Mock Get-FatCachedAdGroupId {
                Return @{Id = '9acd586e-688d-41f6-9dfb-d593941884a3' }
            }
            Test-FatAADGroupsExist -csvPath $csvPath
            Assert-MockCalled Get-FatCachedAdGroupId -Exactly 4
        }

Mocking also helps us substitute a value that would otherwise be unknown with a static value. Below I am mocking Spark Context and also the date and time. With these static values I can then create an expected test result and compare to the actual result of the function being called.

def test_get_nunit_header(mocker):
    x = '{"tags": {"opId": "ServerBackend-f421e441fa310430","browserUserAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36","orgId": "1009391617598028","userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36","clusterId": "0216-124733-lone970","user": "[email protected]","principalIdpObjectId": "71b45910-e7b4-44d8-82f7-bf6fac4630d0","browserHostName": "uksouth.azuredatabricks.net","parentOpId": "RPCClient-bb9b9591c29c01f7","jettyRpcType": "InternalDriverBackendMessages$DriverBackendRequest"},"extraContext":{"notebook_path":"/Users/[email protected]/runeatest"}}'
    context = json.loads(x)
    t = ("2020-9-13", "13:20:16")
    mocker.patch("runeatest.pysparkconnect.get_context", return_value=context)
    mocker.patch("runeatest.utils.get_date_and_time", return_value=t)
    results = []
    results.append(testreporter.add_testcase("test name", False))
    results.append(testreporter.add_testcase("test name 2", True))
    expected = '<test-results name="/Users/[email protected]/runeatest" total="2" date="2020-9-13" time="13:20:16">\n<environment nunit-version="2.6.0.12035" clr-version="2.0.50727.4963" os-version="uksouth.azuredatabricks.net" platform="Win32NT" cwd="C:\\Program Files\\NUnit 2.6\\bin\\" machine-name="0216-124733-lone970" user="[email protected]" user-domain="1009391617598028"/>\n<culture-info current-culture="en-US" current-uiculture="en-US"/>'
    actual = nunitresults.get_nunit_header(results, context)
    assert expected == actual

However it is not all fun and games with mocking; The above mocking of the result of get_context means I’m assuming that this is the set structure and that nothing will be altered in subsequent versions of Spark. This means that I can’t reliably say what versions of Spark the function get_nunit_header will actually work with. In some cases, all you can ever rely on with mocking is that the function under test actually runs. But that is not necessarily a bad thing; if you know the limitations of your unit tests mocking, you can work to complement them with integration tests.

Which Test Do I Write First?

This then brings us on to “but where do I really start with testing?”, and this is because unless you’re writing code with testing in mind (which is not Test-Driven Development), you’re code is probably not written to make testing easy to implement after the fact. At this point you have 3 choices

Rewrite You Code. No one will make this choice, because the code is written and it works (presumably).
Find the easiest part of project to write a test even if it has limited value. This then tackles the questions of “what framework should I use” and “how do I run tests in a pipeline”. And answering those question will provide tremendous value as you continue to write tests.
Write an end-to-end test. In some respects the easiest and hardest choice, because no changes have to be made to the project, yet something has to be invoked and then the result checked. Doing this has its benefits in that you can then begin to write tests as you find bugs, and refactor code to make it easier to test, and still have some confidence that your change does not break the system under tests because you have this one end-to-end test.

Really, if you want to improve your testing over time then options 2 and 3 should be where you put your effort.

Code Coverage Is A Metric And Not a Target

Let’s be pragmatic and acknowledge that unless you start writing tests the same day you start writing code then you’re very unlikely going to achieve 100% Code Coverage. Again, people have thoughts on whether tracking code coverage offers any value. And it does, if you take it as understanding what is tested, what isn’t, and prioritise the gaps (and only if you’re putting huge effort into writing tests over new functionality). If you have a framework and a way of running tests on a build then the next target is to write tests to new functionality or changes to old functionality, and over time your code coverage will begin to increase. And as long as it is increasing rather than decreasing then you’re doing very well.

Consider also that multiple test cases can test the same pieces of code in a variety of ways. In the adls2.folder.access.tools PowerShell module there is one function called Set-FatAdlsAccess that sets the Access Control Lists (ACLs) on the folders in the Azure Gen 2 Data Lake Store. This one function is hugely important and rightly so I wrote many different tests for the different scenarios that could take place. In terms of code coverage, it makes very little difference, but code coverage cannot report on the fact that we cna be very confident that the functionality works as expected.

Meet The New Bugs, Just Like The Old Bugs

And more importantly, making a change to the function means that these tests continue to demonstrate the function works as required. A good set of tests will not discover bugs that exist, rather they will prevent new bugs being added. Old bugs can really only be found by writing tests.

Write Automated Tests over Documentation

One way to motivate people to write tests is to present tests as the documentation; documentation and comments in code can go stale and become redundant at any point, and yet there is no way to check the validity of documentation or comments. The same cannot be said of tests; by their binary nature they either pass or fail. If they pass then the absolute worst thing you can say is that they code works but does the wrong thing! And a tests that is never run can be as good as disregarded. Documentation or comments can be misleading and can cause massive harm.

Writing tests over writing documentation can certainly be seen/presented as the lesser of two evils; a developer is far more likely to embrace the technical challenge of creating a good test suite over writing documents in a corporations wiki of choice. The challenge becomes other developers being able to understand what the tests are trying to achieve.

Wrap It Up

I probably have a bad skew on how much testing is done in the wild because I work mainly on data systems over app systems, and testing with data, especially data at scale, makes testing that much harder. But testing is very rarely a waste of time, and I hope that this diatribe of my thoughts on testing has proved not to be a waste of time.

bzzzt!

What's Going On?