Thursday, April 19, 2012

PowerShell: Using Exponents and Logs to Format Byte Sizes

In a recent Scripting Games event, I had to format some byte values into their most appropriate size according to how many bytes were present. Anyone who hangs around with me long enough will realize I love numbers and the preciseness that they provide. This is one of many reasons why I enjoy computer programming and especially working with SQL Server. Even more than my love of numbers is my hatred toward duplicating work.

This is one of my longer posts, but I promise some neat stuff if you stick with me on this…

When I started to code for the event, I headed down the path of using the PowerShell switch statement. I had never known that you could use an expression to evaluate a switch – so that was something really cool. This is the function I initially came up with (all the commenting and help removed for brevity):

switch ($a)
{
    #each step in the switch increases by a multiplier of 1024
    {$_ -lt 1024} {"{0:N2} Bytes" -f ($a)}
    {$_ -ge 1024 -and $_ -lt (1048576) } {"{0:N2} KiloBytes" -f ($a/1024)}
    {$_ -ge 1048576 -and $_ -lt (1073741824) } {"{0:N2} MegaBytes" -f ($a/1048576)}
    {$_ -ge 1073741824 -and $_ -lt (1099511627776) } {
        "{0:N2} GigaBytes" -f ($a/1073741824)}
        
    #Stop @ TeraBytes, could be more than 1024 TeraBytes, but that is acceptable
    default {{"{0:N2} TeraBytes" -f ($a/1099511627776)}}
    
}

This was a workable function, but it didn’t feel right. I had to do a lot of typing and we all know we only have so many keystrokes in our lives, so I stepped back and re-evaluated.

At each expression, I was looking at a multiplier of 1024. Like the veil was removed, I realized that these were exponents. Time for the .NET Math Class (pun intended). Among some of the other cool methods of this class is the Pow method. No, we’re not in the original Batman series fighting the super villain, it’s the equivalent of the T-SQL POWER function. I soon had a simple function and all those large nasty numbers replaced with their exponential equivalents. Here is version two:

function Get-Power
{
    
    param([double]$RaiseMe)
    
    [Math]::Pow(1024, $RaiseMe)

}

switch ($a)
{
    #each step in the switch increases by a multiplier of 1024
    {$_ -lt (Get-Power 1)} {"{0:N2} Bytes" -f ($a)}
    {$_ -ge (Get-Power 1) -and $_ -lt (Get-Power 2) } {
        "{0:N2} KiloBytes" -f ($a/(Get-Power 1))}
        
    {$_ -ge (Get-Power 2) -and $_ -lt (Get-Power 3) } {
        "{0:N2} MegaBytes" -f ($a/(Get-Power 2))}
        
    {$_ -ge (Get-Power 3) -and $_ -lt (Get-Power 4) } {
        "{0:N2} GigaBytes" -f ($a/(Get-Power 3))}
        
    #Stop @ TeraBytes, could be more than 1024 TeraBytes, but that is acceptable
    default {"{0:N2} TeraBytes" -f ($a/(Get-Power 4))}
    
}

This worked the same and it was a little cleaner, but it still bothered me. There was a lot of copied and pasted code in there and it was all to determine what text to slap on the end of a quotient. I once again stepped back and realized that it was the switch’s fault. I then made it my goal to eliminate the switch.

In the process, I remembered that 1024 is 2^10… this meant that 1024^2 was 2^20, 1024^3 was 2^30 and so on. Believe it or not, this made my path clear.

A quick Algebra refresher – exponents and logs are functionally the same things, just expressed in a way that makes each valuable for different circumstances. For instance, 2^10=1024 is the functional equivalent of LOG_2(1024) = 10. For our purposes, we will want to know the power of 2 each number being provided is. To find the unknown “power” value in an equation, (the 10 above) the log base is technically irrelevant. With that in mind, we can use the natural logarithm (ln) to determine what power of 2 any number represents.

2^x = 1024 bytes
ln(2^x) = ln(1024 bytes)
ln(2)x = ln(1024 bytes)
x = ln(1024) / ln(2)
x = 6.93 / 0.693
x = 10

Before you think, that’s great, thanks for the math lesson and stop reading - this information really does have some decent uses in PowerShell. Let’s see how this math can be applied to make my function above re-usable and easily maintainable code.

The Math class has a method named Log . When only provided with a single value, this function will use “e” as the base – the natural algorithm which is the functional equivalent of “ln” used above. Armed with this function, we can determine which power of 2 any particular number is. More importantly, this information can eventually be used to format any size of number appropriately.

In the code below, I use PowerShell to extract the closest whole power of 2 that makes up the $byte value by implementing the Floor method.

$PowerOfTwo = [Math]::Floor([Math]::Log($byte)/[Math]::Log(2))

If I now create an array of my “descriptors”, I can use this text to be added to the end of each of the numbers to make it pretty.

$ByteDescriptors = ("B", "KB", "MB", "GB", "TB", "PB", 
                "EB", "ZB", "YB", "WYGTMSB")

With the array prepared, the $PowerOfTwo variable can be divided by 10 (and Floored) to provide the index into the descriptor array.

$DescriptorID = [Math]::Floor($PowerOfTwo/10)

Finally, we use the format method to combine all this information into an output. The $Scale variable is set in the script to be 2. Not only is the $DescriptorID used to determine the description, it is also used as a power of 2 in the divisor with the total byte value as the dividend.

Write-Output ("{0:N$Scale} $($ByteDescriptors[$DescriptorID])" -f (
                $byte / [Math]::Pow(2, ($DescriptorID*10))))

I’ve included the full function below which includes all of these pieces as well as full comments and a few extra parameters. While I may have been able to use the first function, the flexibility that the most recent iteration of this script provides seems worth the effort. Not only do I think this is a neat function, I now have an answer for my kids when they ask “When am I ever going to use this stuff in real life?” :-)


function Format-Byte
{
<#    
    .SYNOPSIS
    Formats a number into the appropriate byte display format.

    .DESCRIPTION
    Uses the powers of 2 to determine what the appropriate byte descriptor
    should be and reduces the number to that appropriate descriptor.
    
    The LongDescriptor switch will switch from the generic "KB, MB, etc."
    to "KiloBytes, MegaBytes, etc."
    
    Returns valid values from byte (2^0) through YottaByte (2^80).
    
    .PARAMETER ByteValue        
    Required double value that represents the number of bytes to convert.
    
    This value must be greater than or equal to zero or the function will error.

    This value can be passed as a positional, named, or pipeline parameter.    
    
    .PARAMETER LongDescriptor
    Optional switch parameter that can be used to specify long names for 
    byte descriptors (KiloBytes, MegaBytes, etc.) as compared to the default
    (KB, MB, etc.) Changes no other functionality.
    
    .PARAMETER Scale
    Optional parameter that specifies how many numbers to display after 
    the decimal place.
    
    The default value for this parameter is 2. 
    
    .EXAMPLE
    Format-Byte 123456789.123
    
    Uses the positional parameter and returns returns "117.74 MB"
    
    .EXAMPLE
    Format-Byte -ByteValue 123456789123 -Scale 0
    
    Uses the named parameter and specifies a Scale of 0 (whole numbers). 
    
    Returns "115 GB"
    
    .EXAMPLE
    Format-Byte -ByteValue 123456789123 -LongDescriptor -Scale 4
    
    Uses the named parameter and specifies a scale of 4 (4 numbers after the
    decimal) 
    
    Returns "114.9781 GigaBytes"
    
    .EXAMPLE
    (Get-ChildItem "E:\KyleScripts")|ForEach-Object{$_.Length}|Format-Byte
    
    Passes an array of the sizes of all the files in the E:\KyleScripts folder
    through the pipeline.
    
    .NOTES
    Author:    Kyle Neier
    Blog: http://sqldbamusings.blogspot.com
    Twitter: Kyle_Neier
    
    Because of the 14 significant digit issue, anything nearing 2^90
    will be marked as WYGTMSB aka WheredYouGetThatMuchStorageBytes. If you
    have that much storage, feel free to find a different function and or
    travel back in time a hundred years years or so and slap me...

#>    

    [CmdletBinding()]
    
    param(
        [parameter(
                Mandatory=$true, 
                Position=0,
                ValueFromPipeline= $true
                
            )]
             #make certain value won't break script
             [ValidateScript({$_ -ge 0 -and 
                 $_ -le ([Math]::Pow(2, 90))})] 
        [double[]]$ByteValue,
        
        [parameter(
                Mandatory=$false, 
                Position=1,
                ValueFromPipeline= $false
                
            )]
        [switch]$LongDescriptor,
        
        [parameter(
                Mandatory=$false, 
                Position=2,
                ValueFromPipeline= $false
                
            )]
            [ValidateRange(0,10)]
             [int]$Scale = 2
        
    )
    
    #2^10 = KB, 2^20 = MB, 2^30=GB...
    begin
    {
        if($LongDescriptor)
        {
            Write-Verbose "LongDescriptor specified, using longer names."
            $ByteDescriptors = ("Bytes", "KiloBytes", "MegaBytes", "GigaBytes", 
                "TeraBytes", "PetaBytes", "ExaBytes", "ZettaBytes", 
                "YottaBytes", "WheredYouGetThatMuchStorageBytes")
        }
        else
        {
            Write-Verbose "LongDescriptor not specified, using short names."
            $ByteDescriptors = ("B", "KB", "MB", "GB", "TB", "PB", 
                "EB", "ZB", "YB", "WYGTMSB")
        }
    }
    
    process
    {
        foreach($byte in $ByteValue)
        {
            #Determine which power of 2 this value is based from
            Write-Verbose "Determine which power of 2 the byte is based from."
            $PowerOfTwo = [Math]::Floor([Math]::Log($byte)/[Math]::Log(2))
            
            #Determine position in descriptor array for the text value
            Write-Verbose "Determine position in descriptor array."
            $DescriptorID = [Math]::Floor($PowerOfTwo/10)

            #Determine appropriate number by rolling back up through powers of 2
            #format number with appropriate descriptor
            Write-Verbose ("Return the appropriate number with appropriate "+
                "scale and appropriate desciptor back to caller.")
                
            Write-Output ("{0:N$Scale} $($ByteDescriptors[$DescriptorID])" -f (
                $byte / [Math]::Pow(2, ($DescriptorID*10))))
            
        }
    }
}

Thursday, April 12, 2012

PowerShell and T-SQL: Implementing UNION, INTERSECT, and EXCEPT



I’ve been thoroughly enjoying my experience this year with the Scripting Games. This is the first year I've competed and I decided to participate in the Advanced category just to push myself a little harder. I’ve been using Powershell for quite some time, however, I’ve never had the pleasure of using the platform while performing the duty of a Windows administrator. Those days were all about .bat and .vbs files. It’s been a real treat to be able to get my hands dirty with some of the cmdlets and functionality of Powershell that I have yet to use.

On a couple of events, I’ve found good use of some of my T-SQL set based thinking. Most database geeks understand that the database is designed to work in sets. However, PowerShell, similar to its scripting roots, is designed to think iteratively. However, there are times when understanding how different sets of data compare with each other could be a handy concept to understand. I’m going to spend some time on some very cool operators in T-SQL and show how this same concept can be implemented in PowerShell using arrays of objects.

The T-SQL operators UNION, INTERSECT, and EXCEPT and the Compare-Object cmdlet all operate on entire sets of data. We just came out of Easter, so I encourage you to forget about data for a moment – I know, it’s hard for me too – and think of two baskets of jelly beans.

In your left basket, you have seven jelly beans, one for each color of the rainbow (ROY G BIV). On your right side, your over-sugared children have eaten ROY, leaving you with G BIV and a half eaten Peep. In the middle, we have at this moment an empty basket.

Before we start moving jelly beans around, here are some initial set ups for the T-SQL and PowerShell discussions:

T-SQL:
CREATE TABLE LeftBasket
    (
     ID INT NOT NULL,
     Candy VARCHAR(10) NOT NULL
    )

CREATE TABLE RightBasket
    (
     ID INT NOT NULL,
     Candy VARCHAR(10) NOT NULL
    )

GO
INSERT  INTO LeftBasket
        (ID, Candy)
VALUES
        (1, 'Red'),
        (2, 'Orange'),
        (3, 'Yellow'),
        (4, 'Green'),
        (5, 'Blue'),
        (6, 'Indigo'),
        (7, 'Violet')

INSERT  INTO RightBasket
        (ID, Candy)
VALUES
        (4, 'Green'),
        (5, 'Blue'),
        (6, 'Indigo'),
        (7, 'Violet'),
        (8, 'Peep')

PowerShell:

$LeftBasket = ("Red", "Orange", "Yellow", "Green", "Blue", "Indigo", "Violet")
$RightBasket = ("Green", "Blue", "Indigo", "Violet", "Peep")

UNION

The UNION operator is like taking the ROY G BIV from the left basket, the peep from the right basket, and tossing the G BIV from the right basket across the room because you can’t look at any more jelly beans. UNION will give you the distinct results of the combination of the two baskets of jelly beans. In reality, it doesn’t really toss those from the right, but it made sense in the analogy. You end up with 7 jelly beans and a peep in your middle basket.

This can be represented in T-SQL with the following query (based upon the set up mentioned earlier:
SELECT ID, Candy FROM LeftBasket
UNION
SELECT ID, Candy FROM RightBasket
ORDER BY ID

In PowerShell, we can also represent this same concept using Compare-Object
Compare-Object -ReferenceObject $LeftBasket -DifferenceObject $RightBasket ` 
 -IncludeEqual

The important point of the above command is the –IncludeEqual switch. This switch modifies the behavior of Compare-Object to not just show differences, but to show you all of the values that are the same as well. The results are identical to the UNION operator in T-SQL.

UNION ALL

UNION ALL has a similar function as UNION, but it doesn’t take the time to determine which to throw out and just dumps all the candy in the middle basket. You end up with 11 jelly beans (ROY GG BB II VV) and a peep in your middle basket.

T-SQL:
SELECT ID, Candy FROM LeftBasket
UNION ALL
SELECT ID, Candy FROM RightBasket
ORDER BY ID

PowerShell:
Write-Output ($LeftBasket,$RightBasket)

You’ll notice that I did not use the Compare-Object for the UNION ALL equivalent. This is because no combination of switches for the Compare-Object will show you duplicates. Since all we really want is just the combination of the two string arrays, we can combine them and replicate the results of UNION ALL.

INTERSECT

If you apply the INTERSECT operator to your baskets of jelly beans, you end up with only four jelly beans (G BIV) in the middle. This is because INTERSECT represents the jelly beans that are in common in both baskets. Anything unique on either side is tossed out.

T-SQL:
SELECT ID, Candy FROM LeftBasket
INTERSECT
SELECT ID, Candy FROM RightBasket
ORDER BY ID

PowerShell:
Compare-Object -ReferenceObject $LeftBasket -DifferenceObject $RightBasket `
 -IncludeEqual -ExcludeDifferent

You’ll notice in the PowerShell command that there are two optional switches. We implement the –ExcludeDifferent and the –IncludeEqual switches to make the output not include differences and include any matches. We get only the matched items of the arrays.

EXCEPT

Finally, EXCEPT. This one will take any jelly beans on the left side that do not exist on the right side and put them in your middle basket. In this case, you would have ROY.

T-SQL:
SELECT ID, Candy FROM LeftBasket
EXCEPT
SELECT ID, Candy FROM RightBasket
ORDER BY ID

Powershell:
Compare-Object -ReferenceObject $LeftBasket -DifferenceObject $RightBasket | 
 Where-Object{$_.SideIndicator -eq "<="}

In this PowerShell command, I am adding no switches. Compare-Object is working with default behavior. However, in order to get only those items that exist on the left, we have to implement the Where-Object in the pipeline to get only those that exist on the left and not on the right.

Now, the coolest part about EXCEPT is that if you swap the baskets, you end up with different results. Starting from the right basket, you would end up with nothing more than a peep because all of the jelly beans on the right exist in the set of jelly beans on the left.

T-SQL:
SELECT ID, Candy FROM RightBasket
EXCEPT
SELECT ID, Candy FROM LeftBasket
ORDER BY ID

PowerShell:
Compare-Object -ReferenceObject $RightBasket -DifferenceObject $LeftBasket | 
 Where-Object{$_.SideIndicator -eq "<="}

Ok, if you’re not ready to throw up from all the sugar yet, there is a small gotcha with Compare-Object . If you did a Get-Help on this cmdlet, you may have noticed the –SyncWindow parameter. This parameter tells PowerShell how many array objects in either direction to look for similarities. In PowerShell 1.0, this was a default of 5. This meant that if your two arrays were not ordered by the same property and you had the same object stored more than five items away, you could end up with inaccurate results. Fortunately, in PowerShell 2.0, they have modified this default to [Int32]::MaxValue – so we should be covered.


About Kyle Neier
Husband of a magnificent woman, father of 5, SQL Server geek, IndyPASS Vice President and Food Guy, DBA automation zealot, amateur Powershell evangelist. Follow Me on Twitter

Tuesday, April 10, 2012

T-SQL Tuesday #029 – SQL 2012 and Treating TempDB like it’s really Temporary.


A while back, my young son thought it would be fun to use a CD as a virtual surfboard on our kitchen floor. While he was a very proud boy, all I could think of was how much that CD cost and how unlikely it was that we were going to ever be able to listen to it again.

As my kids got older, I found significant value in exercising my right to make a copy my children’s CDs for personal use. It became part of the process of buying a new CD to copy it and put the original in a safe place – far away from little hands. If the CD copy gets scratched or otherwise destroyed, it is as simple as pulling out the original and making another copy.

We use these copies extensively and, while we encourage treating them properly, life happens. They get left outside, in the car, under the couch, you get the picture. Similarly, we use TempDB extensively, and ever more so as more features of SQL Server use TempDB internally.

In SQL 2008R2 and older, TempDB had to be on a shared disk when implementing SQL Server in a cluster.

This may not seem like a big deal, but think back to the CD analogy. Our user databases are like the original CDs. We need to keep them protected and safe from the children. We keep the databases spread across redundant disks for the highest resiliency. We configure security to be certain that only those who are responsible enough can touch the databases. TempDB is like the copies. Anyone can use it and it gets used all the time by both users and by the system. We have little concern and generally little control over how TempDB gets used.

With the shared disk requirement for TempDB, we have to treat our CD copies in a similar fashion as the originals (the user databases). The data is redundantly protected and, in several cases, spread across the same disks as the user data. Getting the performance necessary out of TempDB in some cases was difficult.

In SQL 2012, this has been changed. A very useful feature of SQL 2012 is the ability to put TempDB on a local disk, even when implementing SQL Server on a cluster. This prevents the use of expensive SAN storage for something temporary and volatile. It also introduces the ability to significantly increase performance of some key functionality of SQL Server (Snapshot Isolation, sorts, hash, CheckDB, etc.) by opening up more high-performance options in the local box. SAN based SSD is possible for shared storage, however, enterprise class shared SSD is still incredibly expensive. Local SSD is not as cheap as spinning disk, but is cheaper, and in some cases, faster than SAN based SSD.

Moving TempDB local is not the ultimate solution for all environments and the significance of this feature may not be universal. However, it is another very useful tool in the DBA toolbox that has the potential to reduce the cost of and increase the performance of a SQL Server deployment.

Links to Microsoft concerning this new feature:
http://msdn.microsoft.com/en-us/library/bb500459%28v=SQL.110%29.aspx
http://msdn.microsoft.com/en-us/library/ms143506.aspx#storagetypes

About Kyle Neier
Husband of a magnificent woman, father of 5, SQL Server geek, IndyPASS Vice President and Food Guy, DBA automation zealot, amateur Powershell evangelist. Follow Me on Twitter

Tuesday, April 3, 2012

Powershell – Validating an IP Address

 

A task that I have had to do several times over the years is determine if an IP Address string entered from a prompt or imported from a file is indeed valid. Even more, determining if that IP Address is in use once verifying that the IP Address is well formed.

I’ve created the following Powershell function – Test-IPAddressString – to handle both of these tasks.

Either provide a single or several IP Address candidates to this function and it will return a $true if it is valid or a $false if it is not. As an added bonus, if you enable the –FailIfInUse swtich,the function will attempt to ping a valid IP Address. If the address responds to a ping, $false is returned.

function Test-IPAddressString
{
<#
        .SYNOPSIS
        Tests to see if an IP address string is a valid address and can determine
        if IP Address is responding on network.

        .DESCRIPTION
        Implements the .Net.IPAddress classes to validate that the string provided
        can parse into an IP Address. If the parse is successful, the function returns
        boolean $true unless the -FailIfInUse switch is provided.
        
        If -FailIfInUse is specified, a $true is only returned if the IP Address
        does not respond to a ping three times.

        This function supports the -Verbose switch as well.
        
        .PARAMETER IPaddressString
        This is a string value representing the address that you want to test.
        
        This value can be passed either from pipeline or as a parameter.
        
        .PARAMETER FailIfInUse 
        This is a switch parameter. Specifying this in the parameter list will
        cause the IP Address, if successfully parsed, to attempt to be pinged.
        
        A successful ping result will result in a $false value being returned
        if this switch is used.
        
        
        .EXAMPLE
        Test-IPAddressString -IPAddressString "192.168.1.1"
        
        Using full parameter name

        .EXAMPLE
        Test-IPAddressString "192.168.1.1" -FailIfInUse
        
        Using Positional parameters and specifiying to fail test if IP Address responds to ping.
        
        .EXAMPLE
        ("192.168.1.1", "192.123456", "IsThisAnIP") | Test-IPAddressString -Verbose
        
        Passing serveral potential IP Address strings to the function through the pipeline. 
        
        FailIfInUse is ommitted, so no ping attempts will be made on valid IP Addresses.
        
        This function supports the verbose switch. Using this switch will provide you with
        several indicators of progress through the process.
        
        .NOTES
        
        Author:    Kyle Neier
        Blog: http://sqldbamusings.blogspot.com
        Twitter: Kyle_Neier
        
        .LINK
        http://sqldbamusings.blogspot.com/2012/04/powershell-validating-ip-address.html
    #>

    param(
    [parameter(
            Mandatory=$true, 
            Position=0,
            ValueFromPipeline= $true
        )]
        [string]$IPaddressString,
    [parameter(
            Mandatory=$false, 
            Position=1,
            ValueFromPipeline= $false
        )]
        [switch]$FailIfInUse
    )

    process
    {
        [System.Net.IPAddress]$IPAddressObject = $null    
        if([System.Net.IPAddress]::tryparse($IPaddressString,[ref]$IPAddressObject) -and
             $IPaddressString -eq $IPAddressObject.tostring())
        {
            Write-Verbose "$IPaddressString successfully parsed."
            if($FailIfInUse -eq $true)
            {
                $Pinger = new-object System.Net.NetworkInformation.Ping
                $p = 1
                $p_max = 3
                do 
                {
                    Write-Verbose "Attempting to ping $IPaddressString - Attempt $p of $p_max"
                    $PingResult = $Pinger.Send("$IPaddressString")
                    Write-Verbose "Connection Result: $($PingResult.Status)"
                    $p++
                    Start-Sleep -Milliseconds 500

                } until ($PingResult.Status -eq "Success" -or $p -gt $p_max)

                if($PingResult.Status -eq "Success")
                {
                    Write-Verbose "The IP Address $IPAddressString parsed successfully but is responding to ping."
                    Write-Output $false
                }
                else
                {
                    Write-Verbose "The IP Address $IPAddressString parsed successfully and is not responding to ping."
                    Write-Output $true
                }
            }
            else
            {
                Write-Verbose "The IP Address $IPAddressString parsed successfull - No ping attempt made."
                Write-Output $true
            }
        
        }
        else
        {
            Write-Verbose "The IP Address $IPAddressString could not be parsed."
            Write-Output $false
        }
    }
}

About Kyle Neier
Husband of a magnificent woman, father of 5, SQL Server geek, IndyPASS Vice President and Food Guy, DBA automation zealot, amateur Powershell evangelist. Follow Me on Twitter