PS

PowerShell – fastest search in array

IDrive Remote Backup

If you ever worked with a big array of objects (thousands of records) and made many searches on this array, you probably know that this works slowly.
I had this problem in many of my scripts until I found a simple solution that solved this inconvenience and improved the speed of my script hundreds of times.
The only thing you have to do is to transform an Array to HashTable.

I created a small function that does it for us:

function array2hash ($array, [string]$keyName) {  
   $hash = @{}  
   foreach ($element in $array) {    
       $key = $element."$keyName"    
       if ($hash[$key] -eq $null) {       
          $hash[$key] = $element    
       } 
       elseif ($hash[$key] -is [Collections.ArrayList]) {       
          ($hash[$key]).Add($element)    
       } else {       
          $hash[$key] = [Collections.ArrayList]@(($hash[$key]), $element)    
       }  
    }  
    return $hash
}

How does it work?

This function converts an Array (the first parameter) to a HashTable with a specified Key (KeyName is the second parameter of the function).
If there is more than one value for the specified Key, the function fill create an ArrayList with all Array records that contain the same Key value.

How to use this?

Let’s show it on an example. As an example I will use the database of US zip codes downloaded as a CSV file from:
https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/

As a first step, load data to the array:
$arr = import-csv C:\tmp\us-zip-code-latitude-and-longitude.csv -Delimiter ";"

Now create a hash table for “zip” data (that will take less than 1 second):
$hsh = array2hash $arr "Zip"

Let’s find the zip code = 33418.

To find the zip code in Array I use:
$arr | where {$_.Zip -eq "33418"}
It took 950 milliseconds on my laptop.

Now the same for HashTable:
$hsh."33418"
This operation took 0.2 milliseconds (4750 times faster).
When I repeated the same operations for 100 random zip codes, the Array search took me 103695 milliseconds (1 minute 43 seconds) and HashTable 148 milliseconds (0.1 seconds). As you can see, it is a huge improvement.

Now let’s build a hash table for cities:
$hsh = array2hash $arr "City"

This time search $hsh."New York" returns array with 167 records.

I hope that this post will help you to improve data processing.

Share this post:

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on email