Thursday, April 19, 2012

PowerShell: Using Exponents and Logs to Format Byte Sizes

In a recent Scripting Games event, I had to format some byte values into their most appropriate size according to how many bytes were present. Anyone who hangs around with me long enough will realize I love numbers and the preciseness that they provide. This is one of many reasons why I enjoy computer programming and especially working with SQL Server. Even more than my love of numbers is my hatred toward duplicating work.

This is one of my longer posts, but I promise some neat stuff if you stick with me on this…

When I started to code for the event, I headed down the path of using the PowerShell switch statement. I had never known that you could use an expression to evaluate a switch – so that was something really cool. This is the function I initially came up with (all the commenting and help removed for brevity):

switch ($a)
    #each step in the switch increases by a multiplier of 1024
    {$_ -lt 1024} {"{0:N2} Bytes" -f ($a)}
    {$_ -ge 1024 -and $_ -lt (1048576) } {"{0:N2} KiloBytes" -f ($a/1024)}
    {$_ -ge 1048576 -and $_ -lt (1073741824) } {"{0:N2} MegaBytes" -f ($a/1048576)}
    {$_ -ge 1073741824 -and $_ -lt (1099511627776) } {
        "{0:N2} GigaBytes" -f ($a/1073741824)}
    #Stop @ TeraBytes, could be more than 1024 TeraBytes, but that is acceptable
    default {{"{0:N2} TeraBytes" -f ($a/1099511627776)}}

This was a workable function, but it didn’t feel right. I had to do a lot of typing and we all know we only have so many keystrokes in our lives, so I stepped back and re-evaluated.

At each expression, I was looking at a multiplier of 1024. Like the veil was removed, I realized that these were exponents. Time for the .NET Math Class (pun intended). Among some of the other cool methods of this class is the Pow method. No, we’re not in the original Batman series fighting the super villain, it’s the equivalent of the T-SQL POWER function. I soon had a simple function and all those large nasty numbers replaced with their exponential equivalents. Here is version two:

function Get-Power
    [Math]::Pow(1024, $RaiseMe)


switch ($a)
    #each step in the switch increases by a multiplier of 1024
    {$_ -lt (Get-Power 1)} {"{0:N2} Bytes" -f ($a)}
    {$_ -ge (Get-Power 1) -and $_ -lt (Get-Power 2) } {
        "{0:N2} KiloBytes" -f ($a/(Get-Power 1))}
    {$_ -ge (Get-Power 2) -and $_ -lt (Get-Power 3) } {
        "{0:N2} MegaBytes" -f ($a/(Get-Power 2))}
    {$_ -ge (Get-Power 3) -and $_ -lt (Get-Power 4) } {
        "{0:N2} GigaBytes" -f ($a/(Get-Power 3))}
    #Stop @ TeraBytes, could be more than 1024 TeraBytes, but that is acceptable
    default {"{0:N2} TeraBytes" -f ($a/(Get-Power 4))}

This worked the same and it was a little cleaner, but it still bothered me. There was a lot of copied and pasted code in there and it was all to determine what text to slap on the end of a quotient. I once again stepped back and realized that it was the switch’s fault. I then made it my goal to eliminate the switch.

In the process, I remembered that 1024 is 2^10… this meant that 1024^2 was 2^20, 1024^3 was 2^30 and so on. Believe it or not, this made my path clear.

A quick Algebra refresher – exponents and logs are functionally the same things, just expressed in a way that makes each valuable for different circumstances. For instance, 2^10=1024 is the functional equivalent of LOG_2(1024) = 10. For our purposes, we will want to know the power of 2 each number being provided is. To find the unknown “power” value in an equation, (the 10 above) the log base is technically irrelevant. With that in mind, we can use the natural logarithm (ln) to determine what power of 2 any number represents.

2^x = 1024 bytes
ln(2^x) = ln(1024 bytes)
ln(2)x = ln(1024 bytes)
x = ln(1024) / ln(2)
x = 6.93 / 0.693
x = 10

Before you think, that’s great, thanks for the math lesson and stop reading - this information really does have some decent uses in PowerShell. Let’s see how this math can be applied to make my function above re-usable and easily maintainable code.

The Math class has a method named Log . When only provided with a single value, this function will use “e” as the base – the natural algorithm which is the functional equivalent of “ln” used above. Armed with this function, we can determine which power of 2 any particular number is. More importantly, this information can eventually be used to format any size of number appropriately.

In the code below, I use PowerShell to extract the closest whole power of 2 that makes up the $byte value by implementing the Floor method.

$PowerOfTwo = [Math]::Floor([Math]::Log($byte)/[Math]::Log(2))

If I now create an array of my “descriptors”, I can use this text to be added to the end of each of the numbers to make it pretty.

$ByteDescriptors = ("B", "KB", "MB", "GB", "TB", "PB", 
                "EB", "ZB", "YB", "WYGTMSB")

With the array prepared, the $PowerOfTwo variable can be divided by 10 (and Floored) to provide the index into the descriptor array.

$DescriptorID = [Math]::Floor($PowerOfTwo/10)

Finally, we use the format method to combine all this information into an output. The $Scale variable is set in the script to be 2. Not only is the $DescriptorID used to determine the description, it is also used as a power of 2 in the divisor with the total byte value as the dividend.

Write-Output ("{0:N$Scale} $($ByteDescriptors[$DescriptorID])" -f (
                $byte / [Math]::Pow(2, ($DescriptorID*10))))

I’ve included the full function below which includes all of these pieces as well as full comments and a few extra parameters. While I may have been able to use the first function, the flexibility that the most recent iteration of this script provides seems worth the effort. Not only do I think this is a neat function, I now have an answer for my kids when they ask “When am I ever going to use this stuff in real life?” :-)

function Format-Byte
    Formats a number into the appropriate byte display format.

    Uses the powers of 2 to determine what the appropriate byte descriptor
    should be and reduces the number to that appropriate descriptor.
    The LongDescriptor switch will switch from the generic "KB, MB, etc."
    to "KiloBytes, MegaBytes, etc."
    Returns valid values from byte (2^0) through YottaByte (2^80).
    .PARAMETER ByteValue        
    Required double value that represents the number of bytes to convert.
    This value must be greater than or equal to zero or the function will error.

    This value can be passed as a positional, named, or pipeline parameter.    
    .PARAMETER LongDescriptor
    Optional switch parameter that can be used to specify long names for 
    byte descriptors (KiloBytes, MegaBytes, etc.) as compared to the default
    (KB, MB, etc.) Changes no other functionality.
    .PARAMETER Scale
    Optional parameter that specifies how many numbers to display after 
    the decimal place.
    The default value for this parameter is 2. 
    Format-Byte 123456789.123
    Uses the positional parameter and returns returns "117.74 MB"
    Format-Byte -ByteValue 123456789123 -Scale 0
    Uses the named parameter and specifies a Scale of 0 (whole numbers). 
    Returns "115 GB"
    Format-Byte -ByteValue 123456789123 -LongDescriptor -Scale 4
    Uses the named parameter and specifies a scale of 4 (4 numbers after the
    Returns "114.9781 GigaBytes"
    (Get-ChildItem "E:\KyleScripts")|ForEach-Object{$_.Length}|Format-Byte
    Passes an array of the sizes of all the files in the E:\KyleScripts folder
    through the pipeline.
    Author:    Kyle Neier
    Twitter: Kyle_Neier
    Because of the 14 significant digit issue, anything nearing 2^90
    will be marked as WYGTMSB aka WheredYouGetThatMuchStorageBytes. If you
    have that much storage, feel free to find a different function and or
    travel back in time a hundred years years or so and slap me...


                ValueFromPipeline= $true
             #make certain value won't break script
             [ValidateScript({$_ -ge 0 -and 
                 $_ -le ([Math]::Pow(2, 90))})] 
                ValueFromPipeline= $false
                ValueFromPipeline= $false
             [int]$Scale = 2
    #2^10 = KB, 2^20 = MB, 2^30=GB...
            Write-Verbose "LongDescriptor specified, using longer names."
            $ByteDescriptors = ("Bytes", "KiloBytes", "MegaBytes", "GigaBytes", 
                "TeraBytes", "PetaBytes", "ExaBytes", "ZettaBytes", 
                "YottaBytes", "WheredYouGetThatMuchStorageBytes")
            Write-Verbose "LongDescriptor not specified, using short names."
            $ByteDescriptors = ("B", "KB", "MB", "GB", "TB", "PB", 
                "EB", "ZB", "YB", "WYGTMSB")
        foreach($byte in $ByteValue)
            #Determine which power of 2 this value is based from
            Write-Verbose "Determine which power of 2 the byte is based from."
            $PowerOfTwo = [Math]::Floor([Math]::Log($byte)/[Math]::Log(2))
            #Determine position in descriptor array for the text value
            Write-Verbose "Determine position in descriptor array."
            $DescriptorID = [Math]::Floor($PowerOfTwo/10)

            #Determine appropriate number by rolling back up through powers of 2
            #format number with appropriate descriptor
            Write-Verbose ("Return the appropriate number with appropriate "+
                "scale and appropriate desciptor back to caller.")
            Write-Output ("{0:N$Scale} $($ByteDescriptors[$DescriptorID])" -f (
                $byte / [Math]::Pow(2, ($DescriptorID*10))))


SQLvariant said...

I must be missing something, why not just use /KB /MB /GB?

Kyle Neier said...

One of the iterations not on this post included the 1KB/1MB/1GB - but I did not show that. I felt it was easier to understand the value of the exponent when I showed the full numbers instead of the Powershell shorthand. Even when using that, I still had to have either an if or a switch, so I didn't feel it added to the intent of the post that much.


Kyle Neier said...

After reading with Log(1024) I realized that this could be simplified even further.

I got focused on the powers of 2 because I've used that for other purposes in different capacities. If I apply what Rob shared, this would be the two lines to get the file sizes:

$Powers = [Math]::Floor([Math]::Log($byte)/[Math]::Log(1024))

Write-Output ("{0:N$Scale} $($ByteDescriptors[$Powers])" -f ($byte / [Math]::Pow(1024, ($Powers))))

The intermediary steps of dividing by 10 and multiplying by 10 can be eliminated.