Leverage Windows Analytics for Modern Ops – Part 1

, ,

Windows Analytics

Often, I have come across clients asking to stabilize their environment.  As I have gone from client to client, environment to environment I have noticed a common theme.  What is stability?  How do we define or measure stability?  Often stability is defined by ticket numbers, perspective or which user is the loudest.  None of these measures are actual truths of data.  They are multiple data sources that can be important; However, are not an absolute source of truth.  The next question is usually well how do we measure systemic issues accurately, the answer of course is data, but which data?  In this article we are going to review leveraging, corelating data with Windows Analytics, utilize Power BI to manipulate the data into a dashboard.  At this end of this article you will be able to rely on Modern Ops to drive system stability.

Leverage Windows Analytics for Modern Ops

Windows Analytics uses the Windows Telemetry process which is native in Windows 10 collects various data on the endpoint and uploads the data to Azure Storage for your enterprise Tenant.  There are several useful tables, below are examples of the tables and what we are primarily looking at in each table.

  • DHDriverRelability
    • DriverName: The name of Driver that led to the crash
    • Version: The Version of the Driver
  • DHOSReliabiltiy
    • Manufacturer: Device Manufacturer
    • Model Data: Device Model – System Model from BIOS or MSInfo32
    • Computer: Computer Name
    • KernelModeCrashFreePercentforIndustry: Percent of devices in other organizations that have similar device OS version, build, manufacture, model, etc that have not crashed in last two weeks
  • DHOSCrashData
    • Computer – Computer name which Crashed
    • KernelModeCrashFailureID: Hash of the failure
    • KernelModeCrashBugCheckCode: The Code of the Blue Screen
    • KernelModeCrashCount: Number of times the device crashed with the Failure ID.

Some of this data looks redundant, but key notes from my observations:

  • DriverKernelModeCrashCount for DHDriverReliability seems to be the amount of errors/crashes from the event log but not necessarily blue screens.
  • KernelModeCrashCount for DHOSCrashData seems to be a blue screen crash which includes failed driver, the Crash code & the hash for the failure id of the blue screen

The reason for this theory and this is important is because the amount of volume that is generated by driver crashes/errors in the event log is significantly larger than the volume of blue screens.

  1. For Example, video driver igdkmd64 will show 159 Crash counts from DHOSCrashdata in a 7 day period where the same query using the DHDriverRelability table will show 352 counts

With that being said, lets do some crash hunting. We will start with drivers and using a very simple table view.

DHDriverReliability

We will be focusing on Crash counts, so we will add a simple line to this table

DHDriverReliability | where DriverKernelModeCrashCount >= 1

Leverage Windows Analytics for Modern Ops

Hmmm it looks like igdkmd64.sys is having an issue. A quick google search tells me this is a video driver. Lets further investigate this by clicking the Chart Item > Selcect Pie > Sort by Driver Name > Driver Kernel Crash Mode Count

Leverage Windows Analytics for Modern Ops

Wow, it looks this environment is plagued with video driver issues.

What could be causing this? With using another table we can dig a little deeper

Querying the following (below) will show the actual blue screen bug check code and driver name that appears on the blue screen. The reason why this is important is because a blue screen such as indirectkmd.sys is related to video crashing which could be a domino effect potentially on igkmd64.sys getting flagged

DHOSCrashData | where KernelModeCrashCount >= 1

 

I will repeat the same steps above regarding the pie chart.

Leverage Windows Analytics for Modern Ops

 

The out come shows I need to get to collecting some logs and analyzing the dump files to see what the faulting module is for indirectkmd.sys crashing. With these simple steps you essentially are using data to drive systemic issues to be resolved.

 

Here are some simple queries and business use cases:

1. DHOSCrashData | summarize number=count() by KernelModeCrashFailureId, DriverName, DriverVersion, KernelModeCrashBugCheckCode | sort by number desc | render table

A. Used for tracking total number of blue screen crashes, this gives a good holistic view summarizing crash counts by BSOD

2. DHOSReliability | summarize MyOrgPercentCrashFreeDevices = avg(iff(KernelModeCrashCount >= 1 and KernelModeCrashCount <= 10000, 0, 1)), CommercialAvgPercentCrashFreeDevices = avg(KernelModeCrashFreePercentForIndustry), NumberDevices = dcount(ComputerID) by Manufacturer
| sort by NumberDevices desc | render table

A. This is by far one of my favorites, this shows the stability on devices by looking at the total amount of devices that have not crashed and compares it to devices at other enterprises that are similar models, OS build, and OS patched level. Key business case is this can disprove blaming the “image” by having the ability to state the commercial average is 96% and we are operating at 94%, where as if you are operating at 82% with a commercial average of 96% that would indicate you have some issues that need to be resolved.

3. DHOSReliability| where Model contains "insertmodel"| render table | sort by NumberDevices desc | summarize MyOrgPercentCrashFreeDevices = avg(iff(KernelModeCrashCount >= 1 and KernelModeCrashCount <= 10000, 0, 1)), CommercialAvgPercentCrashFreeDevices = avg(KernelModeCrashFreePercentForIndustry), NumberDevices = dcount(ComputerID) by Manufacturer, Model

A. Break down of Query 2 by specifying a Model

4. DHOSReliability | summarize MyOrgCrashRate = avg(KernelModeCrashCount > 0), CommercialAvgCrashRate = 1-avg(KernelModeCrashFreePercentForIndustry), NumberDevices = dcount(ComputerID) by OSVersion| sort by NumberDevices desc | render table

A. Org Crash Rates compared to commercial crash rates based on the Windows 10 Build

5. DHOSCrashData| summarize sum(KernelModeCrashCount > 0) by TimeGenerated | order by TimeGenerated asc | where sum_ >=50 | where DriverName contains "insertdrivername"

A. Sum of crash counts for a specific driver name by timeline ie 11 days. This is good for trending data.

6. DHDriverReliability | where TimeGenerated > ago(28)| join kind= leftouter (   | where Manufacturer == "Lenovo"  DHOSReliability | project-away ComputerID, DriverPercentCrashFreeDevicesForIndustry, HardwareType | where DriverKernelModeCrashCount >= 1 | project-away OSRevisionNumber, ComputerID, ConfigMgrClientID, KernelModeCrashCount) on Computer

A. Same as Query 5 except not filtering a specific driver and shows the last 28 days of crashes for a particular hardware manufacturer.

The beauty of using windows analytics is the simple nature of the language and the power you can get from hunting down systemic issues and resolving them in your environment. In part 2 of this topic, I will review adding this data into power bi for a self-service dashboard based on some of the above queries. I will post this sometime mid-November. Check back for additional queries and data visualizations

 

Chad Arvay , Chris Buck

 

ALSO CHECK: Bitlocker SSD Vulnerability

Distribution Point Migration Tool-Kit

, , , , , , , , , , ,
The toolkit can be downloaded from my Technet Gallery HERE
This post is a long time in coming, but creating something robust enough to work in most environments that’s still user friendly (with associated documentation) can take a little bit of time.  In the course of one contract I’ve worked, we realized that we needed a way to convert old Secondary SCCM sites into Distribution Points, but we wouldn’t be given any new servers to migrate to. We also knew that the WAN links connecting these remote sites back to our headquarters were severely lacking.  Our solution was to prestage all the content currently stored on the content libraries so we could strip off all the roles (which would clear the SCCM content library), remove unneeded programs and features, add the servers back as Distribution Points, and then reload the prestaged content so it wouldn’t have to transfer over our unspeakably slow WAN connection. We got a peek at this work with my last post of the SCCM Universal Prestage script, but this post will give you the other pieces of the puzzle.  

The Core Functions

Initialize-Toolkit
                This is the first function you call if you’re running the Migration Kit from a PowerShell window you didn’t summon up from inside the Configuration Manager console.  This function will verify that you have Administrator rights, will seek out and import the Configuration Manager module, and will map your CMSite PSDrive if you don’t already have it mapped. This function is also called within every other function after a quick check to make sure that the CMSite drive is mapped.  If it isn’t mapped, it calls the Initialize-Toolkit function and maps it. 
 Console run without admin rights
Console run without admin rights
After the drive has been created
Get-DPContent
                The second function in the toolkit will query our primary site server and return a list of all content that is assigned to the distribution point we provided.  There are multiple ways to get this information. I’ve seen it done with Get-CMDeploymentPackage cmdlet since that will also return package type information that we’ll need later.   However, I chose to do it via the SMS_DPContentInfo WMI class because I find that it returns the same level of information, but does so in roughly 1/3 the time.  It also means that you can run the command without needing to be connected to the CMSite drive if you don’t want to fully initialize everything. 
 A simple report of package ID’s and names
An example of the data stored by SMS_DPContentInfo
Prestage-Content
                This is one of the ‘heavy lifters’ of the toolkit.  This function requires a package ID number, the Distribution Point containing the package, and the folder you want it dumped to after creation. What this creates is a PKGX file named with the package ID of whatever you prestaged.  The way it decides what to prestage is based on the PackageType value that comes from WMI’s SMS_PackageBaseClass. Again, you can get a package type identifier from Get-CMDeploymentPackage if you’d rather go that way, but I like WMI.  Once it’s pulled the PackageType value, it runs it through a SWITCH command and runs the appropriate Publish-CMPrestageContent command.  I don’t do any special logging with this function since Publish-CMPrestageContent already does a good job of it.
 Prestaging a single file
Prestaging multiple packages with a For Loop
Restage-Content
                This function is one of the main reasons I like to save my prestage files with the PackageID as the name.  You input the folder containing the prestage files as well as the name of the Distribution Point they need to be assigned to, and this will get the package type information for that package, run the same switch as Prestage-Content, and issue the Start-CMContentDistribution command with the appropriate flags.  Just to save time, it will also query the Get-DPContent function to make sure that it isn’t trying to reassign packages that are already assigned.
Packages were already assigned in SCCM
 Package distribution in progress
Extract-Content
                This function calls upon Microsoft’s ExtractContent.exe tool to run, and is designed to be run locally from whatever DP you’re importing the package to.  The only flag you need to specify is the location of the prestaged content folder.  It takes the hostname of the computer it’s running from and makes a WMI query to see any packages assigned to it that aren’t in State 0.  If the package shows as state 0, then there’s no further work to be done, and we can just work on the others.  There are multiple ways you can run the extractcontent.exe tool, but I’ve found some to work better than others.   Whether you run it specifying a single package to extract or you run it with an entire folder targeted, I’ve found that when I check the Distribution Point Configuration Status in the SCCM console, there’s always some that still show “waiting for prestage content.”  In almost every case where that’s happened, just re-prestaging the content cleared it up. I don’t know if this is a limitation of the extractcontent.exe tool, my impatience, or what, but it works for me.  Because of that, I actually have my Extract-Content function run through the Prestage content folder one item at a time, so you can re-run the function, it will re-query for unsuccessful packages, and only attempt to extract the packages that didn’t make it the first time.
 
ExtractContent running

Example

Stage-LocalDPContent
                I put this together for our SCCM architect who wanted something that he could quickly and easily run while logged into our Secondary Site Server that was being migrated.  What this does is query the local DP for all assigned content, export it with the Prestage-Content function, and give you a progress bar to show you how far along you are. 

Importing drivers into SCCM in bulk

, , , , , ,

This is taken from my TechNet gallery here: https://goo.gl/n1QT89

     When you’re tasked with something like a Windows 10 upgrade, you’ll find yourself spending lots of time downloading and importing drivers into SCCM.   While this script won’t go out and download them for you (like the Dell and HP Driver Import tools I’ve seen out there), it manufacturer, model, and architecture agnostic, you don’t get caught up trying to negotiate your way past your firewall and proxy teams, and it runs in a bit under 50 lines of code (including comments). Rather than pasting in the entire thing, I’ll do a screenshot and walk through from there.

     For this script to work, there’s some groundwork required on your part. When you download the drivers, they need to be downloaded into a folder that has whatever name you want for your driver package later.  If you’re like me, you’re already doing this as you download. If I need drivers for an HP Z230 desktop, the folder they’re saved in is already called “HP Z230 Windows 10 x64” or something similar so I can find them later.  The way this script works, whatever your folders’ names are is what names your driver packages will end up with.
    Aside from that, all you need to do is plug in the path to the file share that has all your make/model folders in the root, as well as the location where you want to store your driver packages.
    Something you will notice in this script is that I bounce between my C: drive and my SCCM drive. This is because UNC paths don’t always work as expected when you’re on the SCCM drive, and SCCM cmdlets don’t play nice running from anything other than the SCCM drive.  To guarantee they both work when needed, I just switch between locations, and it’s no big deal. 
    This script can take a little while to run, but it will give you feedback as it goes, and it doesn’t lock you out of the SCCM GUI while it runs.