Since August last year I've had to pick up a number of new technologies, one of these being Azure Data Factory (Version 2 to be clear). And there's a clear correlation between how much I enjoy using a technology compared to how long I've used it. And when it comes to ADF I'd say that I've hit a point of not enjoying it at all. Aside from the fact that it is practically impossible to test, the developer experience is a web page/drag and drop affair, and that the source control integration is rubbish (oh wow a blob of json to compare manually, what fun), the thing that's really got me all angsty today is something totally unexpected, yet undoubtedly something someone somewhere would call “a feature.”
So, I have a pipeline that as a penultimate step runs a stored procedure. And the final activity is another stored procedure that runs some custom logging. Therefore it makes sense to me to have the dependency condition on the last activity to be “completed”, like the picture below.
That blue arrow means that the logging activity will always run, irrespective of the status of of the previous activity. OK great, makes sense and it does exactly what I need to do and all is right in the world, right?
Nope! Come to find out that if the second-to-last proc fails, and the last proc succeeds, then the status of the pipeline run is “succeeded”. WTF. This feels wrong. What I'd expect that if an ADF pipeline run has any activity that fails then the run should be marked as failed. So in the scenario above, what I need to change is for the logging step to happen on success or on failure of the previous step, exactly like below. So now when the first proc fails, then the “on failure” activity runs and the status of the pipeline run is “failed”.
Now for a relatively trivial pipeline this was a small change, however for larger, more complicated pipelines you always have to consider all paths to failure, elsewise you end up with a bunch of false positives. There's a post on ADF activity dependencies which confirmed what I had learned today. I get that it's logical conditions, and there can be a few gotchas, but the fact that a failed activity does not equate to a failed run being reported is a little odd.