The other day I had a need to take data from hundreds of logs and transform it, do a few lookups, then finally flatten it into a single unified output file. OK no big deal right, just get each one in memory, do your calculations, and append to an object…
In my case, 1 of the lookups I had to do were to determine if a machine was still in Active Directory. As you may know you can simply run the following 1 liner:
$Machine = "MyComputer" Get-ADComputer $Machine
Well, in my case I had over 230,000 records to look up. This created a few concerns, not only did I not want to cause slow response time on a domain controller from so many queries, it would also take on average about 2-3 seconds for each query to complete.
Then I figured, “OK I can just get ALL computer info from AD in memory first then look them up from there when I need to”. This solved the concern of causing slow response time on the DC. Let’s say we have all our data in a variable called $ADData, to look up a machine we can do 1 of the following options:
$CurrentMachine = "MyComputer" #Option 1 $ADLookup = $ADData | Where-Object Name -eq $CurrentMachine #Option 2 $ADLookup = $ADData.where({$_.Name -eq "$CurrentMachine"})
Both of these options are legitimate code, however with that many records in memory, it was still taking about 3 seconds for either of those options to process. Doing some quick math:
3 seconds per machine x 230,000 machines = 690,000 seconds or 7.98 DAYS to process
It’s often stated that PowerShell isn’t as efficient as Python for processing data, and this is a perfect example of that. But what if it could be much more efficient…
Enter DataTable object types:
Data Tables are .NET based and are blazing fast, but there doesn’t seem to be a built-in option for converting data sets. In an effort to not reinvent the wheel here are a few options on the gallery that can do it for you:
https://www.powershellgallery.com/packages/Pipeworks/1.9.9.4/Content/ConvertTo-DataTable.ps1
https://www.powershellgallery.com/packages/PSSQLite/1.0.1/Content/Out-DataTable.ps1
Note: When converting your dataset this will take a minute or so up front, but the downstream performance gains are far worth it.
Now, I can utilize the .NET Select method from the DataTable object type to achieve a lookup which in my case was able to improve from ~3s to ~3ms. Insane!
$ADDataTable = ConvertTo-DataTable $ADData #Use the Select method with a filter to achieve insanely fast lookups $ADDataTable.Select("Name='$CurrentMachine'") #We can also see that this is now of the type DataTable $ADDataTable.GetType() IsPublic IsSerial Name BaseType -------- -------- ---- -------- True True DataTable System.ComponentModel.MarshalByValueComponent
With this approach, I was able to convert a job that would otherwise take nearly 8 days to run into a job that completes in about 3 minutes!
Who says PowerShell can’t be used to process data fast like Python?