Hello!

Here is a quick script to check the total folder size of a specific folder in an Azure Gen 2 Data Lake. This can take some time to run. The maxsize is set to 100000 items, however chances are this could be increased to make the process faster. I’ve not really had the opportunity to find the optimal value, and anyway it seems to be working OK on my machine (Intel i7 7th Gen, 32GB of RAM) but depending on your machine your mileage may vary.

I’ll warn you by saying that this is slow. And not a few minutes slow; slow as in “slow roasted” slow. Because of this I’m printing out the total size for each do so that I can see the script is still running and because I am concerned that it might fall over and leave us with no result for our effort.

EDIT: I have updated the script below to be a little more robust when running for a long time, or even in an Azure DevOps…

$myfilesystem = "container"
$mypath = "pathoffolder"
$ctx = New-AzStorageContext -StorageAccountName "azuredatalakegen2name" -UseConnectedAccount
$MaxReturn = 100000
$Total = 0
$Token = $Null
$TotalFileSize = 0
try {
    do {
        $attempts = 3    
        $sleepInSeconds = 5
        do {
            try {
                $items = Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $myfilesystem -Path $mypath -Recurse -MaxCount $MaxReturn -ContinuationToken $Token | Where-Object IsDirectory -eq $false
            }
            catch [Exception] {
                Write-Host $_.Exception.Message
            }            
            $attempts--
            if ($attempts -gt 0) { sleep $sleepInSeconds }
        } while ($attempts -gt 0)    
        $Total += $items.count
        $itemsFileSize = $items | Measure-Object -Property Length -Sum
        $TotalFileSize += $itemsFileSize.Sum
        if ($items.Length -le 0) { Break; }
        $Token = $items[$items.Count - 1].ContinuationToken;
    }
    While ($Null -ne $Token)
}
catch {
    Write-Host "An error occurred:"
    Write-Host $_
    $LogTotalFileSize = $TotalFileSize | Select-Object @{Name = "SizeInBytes"; Expression = { $_ } },
    @{Name = "SizeInKB"; Expression = { $_ / 1KB } }, 
    @{Name = "SizeInGB"; Expression = { $_ / 1GB } },
    @{Name = "SizeInTB"; Expression = { $_ / 1TB } }
    Write-Host "##vso[task.logissue type=error] Size for container $container logged so far is $LogTotalFileSize" 
    $throw = 1

}
finally {
    $TotalFileSize | Select-Object @{Name = "SizeInBytes"; Expression = { $_ } },
    @{Name = "SizeInKB"; Expression = { $_ / 1KB } }, 
    @{Name = "SizeInGB"; Expression = { $_ / 1GB } },
    @{Name = "SizeInTB"; Expression = { $_ / 1TB } } 
    if ($null -ne $throw) {
        Write-Host "Something went wrong. Check logs for error."
        Throw
    }
}