Cylance, PKI, and You

, ,

Maintaining PKI functionality through the upgrade to CylancePROTECT 1580

We recently deployed Cylance 1584 across our environment. Cylance implemented version 2 of their Memory and Script Controls with version 1580, and we were a few versions behind. We were expecting a few bumps in the road as a result, but none as significant as the loss of our PKI environment.

How it started

The button was clicked, and Cylance 1584 was deployed. Tickets were coming in from our SIEM, but that that was to be expected. The first alert came in the next day. One of the support engineers reported that devices were failing to enroll into Intune through AutoPilot. The process failed at the Certificate section of the Device Setup. She was able to continue AutoPilot to get to the desktop, but WiFi and VPN weren’t connecting. This meant the device had no device certificate from PKI.

Troubleshooting PKI

Event Viewer is always a good place to start for troubleshooting. There were a lot of error events in the Application log. These events had ID 29 with Source NetworkDeviceEnrollmentService. The message read The password in the certificate request cannot be verified. It may have been used already Obtain a new password to submit this request. Google had no definite answers, but there were more logs to review.

The next logs I found always came in pairs.

Event IDSourceMessage
1000Application ErrorFaulting Application Name: w3wp.exe
1023.NET RuntimeApplication: w3wp.exe

These logs show that this w3wp.exe executable is being blocked. The time stamps on the events line up with the Cylance update, so it seems obvious that Cylance is blocking the executable. Oddly enough, there are no alerts in Cylance that this executable is blocked. This w3wp.exe process seems to be the key to the certificate issue, but there isn’t enough information in the logs to fix the issue.

The next log to investigate is the System log. Searching the logs around the same timestamp as the w3wp.exe logs, I find the next set of logs.

Event IDSourceLevelMessage
5011WASWarningA process ervicing application pool 'Microsoft Intune CRP Service Pool' suffered a fatal communication error with the Windows Process Activation Service. The process id was 'xxxxx'. The data field contains the error number.
5002WASErrorApplication pool 'Microsoft Intune CRP Service Pool' is being automatically disabled due to a series of failures in the process(es) serving that application pool.

These logs indicate that the w3wp.exe executable has failed enough times that the IIS Application Pool has been disabled.

Putting the pieces together

Cylance 1580 will occasionally block executables that depend on .NET runtimes. When this happens, you will not get alerts about the executable in the Admin Console. In the policy that you deploy to your web servers, you will need to include the following exclusions to Memory Actions:

  • \Windows\System32\inetsrv\w3wp.exe
  • \Windows\sysWOW64\inetsrv\w3wp.exe
  • \Program Files (x86)\IIS Express\iisexpress.exe

It is worth noting that at this time, Cylance can apply a hotfix to your account that will automatically apply those exclusions. This will upgrade your version to 1584.46. This will also be included in version 3 whenever that is released later this year.

You should also keep this in mind for troubleshooting additional applications in your environment. By quickly filtering your Application log by event ID 1000, you can gauge if there are any blocks on your system caused by Cylance. Not all ID 1000 events will be caused by Cylance 1580, but it can be a good starting point for troubleshooting.

Policy Evaluation Errors – IIS Connections

In this scenario the customer is trying to image systems from within WinPe. They are able to select a task sequence, but then once policy dependencies evaluate the customer receives the “An error occurred while retrieving policy for this computer (0x80004005) message. This is much a generic error and I’ve covered how to fix a dozen or so different errors in my Troubleshooting OSD Task Sequences document that I’ve have been meaning to publish for a few years, but I have given it to various customer accounts over the years.

 

This error is reported as happening right after the task sequence selection occurs when the system looks up policy for the system. Task sequence the log file is in ram you will see it at X:\Windows\Temp\SMSTSLog\smsts.log

 

We can see in the log file there are problems communicating with the management point retrieving policy during the lookup. The usual troubleshooting of making sure sccm is functional, boundaries, etc can be skipped since we know the specific problem and we have the historical knowledge of the IIS Connections This environment just recently had a SCCM outage and there was a significant amount of policy request backlog for the environment to catch up. At the time of the customers attempts to run an OSD tasks sequence the IIS connections was set to 500, and needed to be raised to a higher number, or to unlimited (depending on how quickly the backlog of policy requests progressed) That will be covered in another blog-post as the situation is still being monitored.

If you access your SCCM Site Server. In this environment the MP co-located. Open up IIS > navigate to your Default Website > Configure Limits

 

This was originally set much lower to 500 to let the CPU usage drop to a more stable rate to process though policies etc. I would recommend if trying to recover from high CPU usage due to a number of different reasons that you limit the connections and gradually increase that number.

The customer was then able to start imaging systems once the number was increased. When you are trying to recover sccm services make sure you keep an eye on the cpu usage in case you have to throttle that number back down, or if you can keep increasing the connections number.