Policy Evaluation Errors – IIS Connections

In this scenario the customer is trying to image systems from within WinPe. They are able to select a task sequence, but then once policy dependencies evaluate the customer receives the “An error occurred while retrieving policy for this computer (0x80004005) message. This is much a generic error and I’ve covered how to fix a dozen or so different errors in my Troubleshooting OSD Task Sequences document that I’ve have been meaning to publish for a few years, but I have given it to various customer accounts over the years.

 

This error is reported as happening right after the task sequence selection occurs when the system looks up policy for the system. Task sequence the log file is in ram you will see it at X:\Windows\Temp\SMSTSLog\smsts.log

 

We can see in the log file there are problems communicating with the management point retrieving policy during the lookup. The usual troubleshooting of making sure sccm is functional, boundaries, etc can be skipped since we know the specific problem and we have the historical knowledge of the IIS Connections This environment just recently had a SCCM outage and there was a significant amount of policy request backlog for the environment to catch up. At the time of the customers attempts to run an OSD tasks sequence the IIS connections was set to 500, and needed to be raised to a higher number, or to unlimited (depending on how quickly the backlog of policy requests progressed) That will be covered in another blog-post as the situation is still being monitored.

If you access your SCCM Site Server. In this environment the MP co-located. Open up IIS > navigate to your Default Website > Configure Limits

 

This was originally set much lower to 500 to let the CPU usage drop to a more stable rate to process though policies etc. I would recommend if trying to recover from high CPU usage due to a number of different reasons that you limit the connections and gradually increase that number.

The customer was then able to start imaging systems once the number was increased. When you are trying to recover sccm services make sure you keep an eye on the cpu usage in case you have to throttle that number back down, or if you can keep increasing the connections number.

WSUS still runs old code

As a consultant I’ve seen a fair number of environments, and the story is usually the same. Most environments are not leveraging ConfigMgr to it’s fullest capabilities. Today I’m not going to talk about migrating to the cloud, and Intune today. Although some of that will be coming soon, along w/ Win 10 servicing, to include custom actions and some automation scripts that I use to perform some record keeping tasks. I’m a firm believer in data driven decisions as the reporting in service model is a bit lackluster so I’ll give you the tools to help w/ that. Also in a few future blogpost I’ll be dropping some info on cyber threat hunting, identifying breaches earlier,incident response, and how to ******** *** ******** ******. That last part was super classified so I won’t talk about that.

Anyways. Back when we started WIn 10 to Win 10 migrations to match Microsoft win 10 upgrade cadence about ~ 5 years ago there was the in place upgrade task sequence and servicing model. Many customers chose the task sequence route as there was a bit of familiarity to it, but also b/c of the amount of customization work that had to be done for their environments. The servicing model has it’s perks, but it’s not as robust I would say in comparison to the task sequence. These days (1709 forward) I’m trying to move away from task sequences, and more into the servicing model with these dynamic updates. If you aren’t aware but dynamic updates CAN ONLY BE PULLED FROM THE INTERNET. This will likely be several hundred MB of content, and with the servicing model this action is unknown to the end user, when you do the IPU process the end user is completely impacted during the time b/c of a big task sequence box on the screen. Since this can only be obtained from the internet you can’t exactly service your IPU package w/ this.

If you are in an environment where you do not have servicing enabled and you need handle some of the pre-reqs to allow for this. There are two links here to review. SystemCenterDudes and Prajwal Desai

 

So we enable the upgrades on your SUP, the pre-work is done, and you select sync.

…but what, what if nothing is populated, so it looks like this

Empty Servicing Node

Well, I’ve seen this frequently in the last 10 years in at-least half a dozen environments. Basically when you check your WSYNCMGR.log everything starts out fine. The normal process is the notification file is found, upstream is found, categories are synced, and then updates start syncing. If an update is already synced it will be skipped etc.

All is good until we get to the feature upgrades that we just enabled. We start seeing “The Microsoft Software License Terms have not completely downloaded and ~~cannot be accepted.”, “Too many consecutive failures. Aborting Sync.”

You can also check your event viewer for some error code information.

I’ve seen this issue before, actually several times, in many environments. Ultimately the fix ends up being to remove all classifications from the SUP > WSUSUTIL RESET > Run Sync > Add Classifications > Run Sync > Monitor Logs.

Check out this Link for how to run WSUS Reset

Once we run the reset and the sync completes we are now able to see the servicing node populated.

 

Ultimately I wrote this blog to cite myself to say I’ve seen this problem many times, for many years and that it is still not addressed. When you are dealing on timelines this problem adds several hours, or even days to the project b/c of having to perform the reset then download content and metadata. but let me tell you a funny joke. Why did the chicken cross the road? I don’t know but WSUS runs on old code, like 2003 code from before I was even in high school.

 

NOTE: I hope David James sees this, and he takes over the WSUS stuff, and makes it better, like everything else he makes better.

 

AOVPN Deployment with SCCM – Lessons Learned

We recently completed an AOVPN deployment with SCCM and hit a few bumps along the way,so thought I’d document to help anyone else.  One point to note is, I had nothing to do with the AOVPN solution configuration, just the deployment with SCCM. This information below is a combination of our testing / troubleshooting / questions and answers from redditors / piloting / MS Cases etc.

Our environment

Azure AOVPN Gateways
IkeV2
Device Tunnel Profile (routes for AD services)
User Tunnel Profile (routes for everything else)
SCCM 1810
Win10 1703 – 1909

Microsoft provides the UserCert.ps1 and Devicecert.ps1. After lots of testing and bug finding and troubleshooting, we may some changes to the install script (not what MS does, but before and after). Please note, scripting is not my forte, so the snippets will be clunky (if it looks dumb but it works, it’s not dumb).

We found that we had to make the following changes to the default install scripts;

Script Actions

  1. Uninstall Existing AOVPN Profiles (ensures no conflicting profiles)

  2. Change Regkey to change service dmwappushservice to automatic / start dmwappushservice

  3. Run the standard MS script to create AOVPN Profile

  4. Update the PBK file and change value of IPInterfaceMetric from 0 to 9

  5. Update the PBK file and change value of Ipv6InterfaceMetric from 0 to 9

  6. Update the PBK file and change the value of NetworkOutageTime from 1800 to 30.

  7. Write an XML version Regkey

  8. Set exit code to 1641 (DT only)

Uninstall Existing AOVPN ProfilesWe use a consistent naming convention for our tunnels, so the first few lines of the install script look for any tunnel names and remove them just to ensure no conflicting profiles and also this function helps us later for xml updates. We found that it wasn’t simple to remove an active tunnel. You ahve to hang it up first…however it autoconnects almost immediately. To get around this, we first set the VPN connection to an incorrect authentication method and then disconnect it to prevent it re-dialling;

Set-VPNConnection -AllUserConnection -Name “TunnelName01” -AuthenticationMethod EAP

##Disconnects the AlwaysOnVPN Device Tunnel

Rasdial.exe “TunnelName01” /disconnect

Remove-VPNConnection -AllUserConnection -Name “TunnelName01” -Force -ErrorAction SilentlyContinue

This method is also used for our uninstall scripts.

Change Regkey to change service dmwappushservice to automatic / start dmwappushserviceWe had an issue on an increasing number of machines, where the profile script ran and did not throw any errors and stated that it was all successful, however there was no trace of the tunnel profiles. After much hairpulling and testing and troubleshooting, we eventually found that the service dmwappushservice must be enabled and running for profile creation to occur. So we added this into our install scripts.

###Set registry key to allow WAP Service to start

Set-ItemProperty -Path HKLM:SYSTEMCurrentControlSetServicesdmwappushservice -Name Start -Type DWORD -Value ‘2’ -Force

Start-Sleep -Seconds 5

##Start Service

Set-Service dmwappushservice -startuptype Automatic

Run the standard MS script to create AOVPN ProfileThis is either the Devicecert.ps1 or UserCert.ps1

Update the PBK file and change value of IPInterfaceMetric from 0 to 9We found that users at home, connected to their router via cable had a different experience to those connected via wifi. This was due to some data attempting to go over the ethernet rather than the tunnel (despite having correct routes) and this was due to the ethernet connection having a higher priority than the tunnel connection. Lowering it’s priority resolved this.

Update the PBK file and change value of Ipv6InterfaceMetric from 0 to 9As above, but for IPv6.

**Update the PBK file and change the value of NetworkOutageTime from 1800 to 30.**If a users home ISP briefly dropped connection, it was taking up to 1800 seconds (30 mins) to reconnect. Changing this value reduced that to 30 seconds max. We left it at 30 seconds to cater for longer ISP drops.

PBK File update script snippet. To Update the PBKs we used the following snippets for DT and UT respectively
#Load System PBK
$Syspbk = Get-Content “C:ProgramDataMicrosoftNetworkConnectionsPbkRasphone.pbk”
#Update System PBK Network Metric
$Syspbk | % { $_.Replace(“IpInterfaceMetric=0”, “IpInterfaceMetric=9”) } | % { $_.Replace(“Ipv6InterfaceMetric=0”, “Ipv6InterfaceMetric=9”) } | % { $_.Replace(“NetworkOutageTime=1800”, “NetworkOutageTime=30”) } | Set-Content “C:ProgramDataMicrosoftNetworkConnectionsPbkRasphone.pbk”

##UserTunnel

#Enumerate currently logged on user$Username = $((Get-WMIObject -class Win32_ComputerSystem | select username).username).split(“”)[1]#Load User PBK$userpath = “C:Users$UsernameAppDataRoamingMicrosoftNetworkConnectionsPbk_hiddenPbk”$Userpbk = Get-Content “$userpathRasphone.pbk”#Update User PBK

$Userpbk | % { $_.Replace(“IpInterfaceMetric=0”, “IpInterfaceMetric=9”) } | % { $_.Replace(“Ipv6InterfaceMetric=0”, “Ipv6InterfaceMetric=9”) } | % { $_.Replace(“NetworkOutageTime=1800”, “NetworkOutageTime=30”) }| Set-Content “$userpathRasphone.pbk”

Write an XML version RegkeyThis was used to track XML versions, but also assist with the update process by utilising Detection Methods. Detailed further down.

**Set exit code to 1641 (DT only)**This tells SCCM to force a reboot (with reboot times honoring the client settings). We found that the DT didn’t always connect immediately, sometimes it took 1 reboot for the auto-always on connection to kick in. We had tight timelines and needed the DT to be deployed, to allow a usercert to be deployed for the UT. So this may not be needed by all people.

Detection Method

Originally, i tried using a powershell script;

if (Get-Command Get-VpnConnection -ErrorAction SilentlyContinue) # Check if the system supports this cmdlet first

{

if (Get-VPNConnection -AllUserConnection | where {$_.Name -match “Tunnel01”})

{ Write-Host “Installed” }

else {}

}

else {}

This worked great for Device Tunnel, but not for user tunnel. We are deploying our UT to User Collections, but running as system. Even though it is set to run as system, the detection method runs as User. This is a known bug in sccm (thanks reddit) . In our environment, users are blocked from running powershell, so this wasn’t a suitable method.

We opted for simple regkey detection. Our AOVPN packages look for the presence of the following regkeys;

UT

1)HKLMSYSTEMCurrentControlSetServicesRasManConfigAutoTriggerProfileEntryName = Tunnel01 (system created regkey)2) HKLMSYSTEMCurrentControlSetServicesRasManConfigUTXML = 1.0 (script created regkey, as noted above)

DT

1)HKLMSYSTEMCurrentControlSetServicesRasManDeviceAutoTriggerProfileEntryName = Tunnel02 (system created regkey)2) HKLMSYSTEMCurrentControlSetServicesRasManDeviceDTXML = 1.0 (script created regkey, as noted above)

Updating the solutionTo update the solution, add routes or modify the xmls. We simply add the new xml to the package, update the install script to the new xml version and update the detection method to the new xml version. This forces all devices / users with an old version of the tunnel profiles to reinstall.

DeploymentWe deployed the DT to all devices (using an exclusion collection for devices below 1803 as DT is not supported on those devices).UT we deployed to user collections, but installs as system.

Troubleshooting: The best way to troubleshoot is to always remove as many moving parts from the equation. Most our testing was done using PSExec (system) and running the script with the same parameters that we would when using SCCM.
Event Log: In the event log, look in the applications log for anything from Rasclient for further information. I created a script in SCCM to grab this information remotely and put it on a server share.
Get-EventLog -LogName Application -Newest 100 -Verbose | ft -Wrap > $Path$Compname.EventLog.Application.csv
Get-EventLog -LogName Application -Source Rasclient -Newest 100 -Verbose | ft -Wrap > $Path$Compname.EventLog.RASClient.log
Trace Logs: Advanced Rasclient logs can also be enabled by running the following command on the machine;
netsh ras set tracing * enabled
netsh ras diagnostics set loglevel all
This then writes log files to;
C:WindowsTracing

I really hope this helps somebody else out there and you don’t have to do as much head to wall banging.
Best of Luck