The other day I had a need to take data from hundreds of logs and transform it, do a few lookups, then finally flatten it into a single unified output file. OK no big deal right, just get each one in memory, do your calculations, and append to an object…

In my case, 1 of the lookups I had to do were to determine if a machine was still in Active Directory. As you may know you can simply run the following 1 liner:

$Machine = "MyComputer"
Get-ADComputer $Machine

Well, in my case I had over 230,000 records to look up. This created a few concerns, not only did I not want to cause slow response time on a domain controller from so many queries, it would also take on average about 2-3 seconds for each query to complete.

Then I figured, “OK I can just get ALL computer info from AD in memory first then look them up from there when I need to”. This solved the concern of causing slow response time on the DC. Let’s say we have all our data in a variable called $ADData, to look up a machine we can do 1 of the following options:

$CurrentMachine = "MyComputer"

#Option 1
$ADLookup = $ADData | Where-Object Name -eq $CurrentMachine

#Option 2
$ADLookup = $ADData.where({$_.Name -eq "$CurrentMachine"})

Both of these options are legitimate code, however with that many records in memory, it was still taking about 3 seconds for either of those options to process. Doing some quick math:

3 seconds per machine x 230,000 machines = 690,000 seconds or 7.98 DAYS to process

It’s often stated that PowerShell isn’t as efficient as Python for processing data, and this is a perfect example of that. But what if it could be much more efficient…

Enter DataTable object types:

Data Tables are .NET based and are blazing fast, but there doesn’t seem to be a built-in option for converting data sets. In an effort to not reinvent the wheel here are a few options on the gallery that can do it for you:

https://www.powershellgallery.com/packages/Pipeworks/1.9.9.4/Content/ConvertTo-DataTable.ps1

https://www.powershellgallery.com/packages/PSSQLite/1.0.1/Content/Out-DataTable.ps1

Note: When converting your dataset this will take a minute or so up front, but the downstream performance gains are far worth it.

Now, I can utilize the .NET Select method from the DataTable object type to achieve a lookup which in my case was able to improve from ~3s to ~3ms. Insane!

$ADDataTable = ConvertTo-DataTable $ADData

#Use the Select method with a filter to achieve insanely fast lookups
$ADDataTable.Select("Name='$CurrentMachine'")

#We can also see that this is now of the type DataTable
$ADDataTable.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     DataTable                                System.ComponentModel.MarshalByValueComponent

With this approach, I was able to convert a job that would otherwise take nearly 8 days to run into a job that completes in about 3 minutes!

Who says PowerShell can’t be used to process data fast like Python?