Hello!

Since the time that Microsoft declared that they heart open source, and also the move the Azure, there has been a massive increase in the what and how to do something. For example, off the top of my head, Azure Functions support PowerShell, C#, Python, and many more languages. This increase in choice is great; far better to have many resources/services that are specifically designed to do one thing really well rather than add a feature onto an already existing product. I think SQL Server is an amazing product, but back 8/9 years ago you had Reporting Services, Analysis Services and Integration Services packaged as part of what is essentially a relational database engine, which wasn’t entirely necessary. Now, the Data Platform in Azure splits out similarly-related products like these into the constituent resources/technologies that can be deployed (and billed for) separately.

Anyway, this choice can also be problematic because it is sometimes not entirely obvious how to do something; both ADF and Azure Databricks can move data from one source to a sink and do some munging in the process, and the question “which is more suitable to my needs” can be difficult to answer.

To answers all these problems, all you need to do is take an Azure Architect exam, after all ,that is what they’re there for right?

Glibness aside, how does all this relate to downloading an Azure Artifact from a PyPi feed?

The challenge I faced was how to download the wheel I needed that was being published to a PyPi feed in Azure Devops. I didn’t want to do it manually (obviously) and also wanted whatever script I wrote to be runnable in an Azure DevOps pipeline. At this point I could’ve chosen:

  • PowerShell and Az cmdlets
  • Bash and az cli
  • Python and Azure SDK
  • any one of the languages listed above and the REST API
  • etc

Clearly, if I listed them all out it would be overwhelming choice. Were it just me managing this pipeline I’d’ve chosen PowerShell and be done by teatime. However other members of the team prefer python, so out of deference to not wanting to be solely responsible for any piece of PowerShell written ever, I decided to go for python. Initially I chose the Azure SDK, however the SDK leaves a little to be desired in terms of interacting with the Azure Artifacts. First off, the “download_package” method is not intended for automation. That’s not my opinion, that’s a fact stated in the code.

cannot automate downloading package

So eventually I decided to resort to good old API calls and stream the result.

def get_wheeltools_from_azure_artifact(package_version="latest"):
    feed_id = "MYFEED"
    package_name = "wheel-tools"
    project_name = "MYPROJECT"
    the_org = "organisationname"
    if package_version == "latest":
        print("no package version supplied; getting latest version number")
        package_version = get_latest_version_number_for_wheeltools(
            feed_id, package_name, project_name
        )
    file_name = f"wheel_tools-{package_version}-py2.py3-none-any.whl"
    url = f"https://pkgs.dev.azure.com/{the_org}/{project_name}/_apis/packaging/feeds/{feed_id}/pypi/packages/{package_name}/versions/{package_version}/{file_name}/content?api-version=6.1-preview.1"

    wheel_tools = requests.get(url, auth=("", (os.getenv("personal_access_token"))))

    wheel_tools_output_path = os.path.join(
        os.path.join(os.getcwd(), "tf/wheel_tools_library"), file_name
    )

    print(f"Downloading to {wheel_tools_output_path}")

    with open(wheel_tools_output_path, "wb") as f:
        f.write(wheel_tools.content)
    return wheel_tools_output_path

But then as you can see I also wanted to be able to specify a version, and if no version was specified then grab the latest. Again, there is a call to the API to get a version number, but you need the version number to get the version number. Again, not my opinion, it’s in the docs. I was hoping that if no version was specified then you’d get the latest, but sadly that is not the case.

need to specify version number to get version number

OK, so now we know that this choice has so farbeen rather unimpressive. What you need to do to get the latest version is grab all versions and then pick out the first one like so:

def get_latest_version_number_for_wheel(feed_id, package_name, project_name, organisation):
    url = f"https://feeds.dev.azure.com/{organisation}}/{project_name}/_apis/packaging/Feeds/{feed_id}/packages?includeAllVersions=true&packageNameQuery={package_name}&protocol​Type=PyPI?api-version=6.1-preview.1"
    package_info = requests.get(url, auth=("", (os.getenv("personal_access_token"))))
    package_info_loads = json.loads(package_info.text)
    print(package_info_loads["value"][0]["versions"][0]["version"])
    package_version = package_info_loads["value"][0]["versions"][0]["version"]
    print(f"Setting latest version as: {package_version}")
    return package_version

And it all works fine. I’ve resroted to using REST API calls and munging the results to get what I want, which was not what I initially set out to do. Happily the $(system.accesstoken) worked in the pipeline and I was able to get the version no issues, so that was one thing that worked well.

So to tie it all in to the opening thoughts on the what and the how to do things, I think the moment you make a decision to do one thing one way, you’ll find many reasons why it’s not fit for purpose. It’s like naming conventions; I’ve never seen a naming convention that actually helped identify the resouirce succinctly. The answer here is to use tags by the way, but you probably know that already! Deciding to use one specific tech and being dogmatic about it is a guarantee to making many bad decisions along the way.