RDP Broke Every Morning After an Azure VM Start Task

I recently helped a new client with an existing Azure infrastructure and conducted an analysis of how they could reduce their Azure spend. This was a smaller organization without many resources, but their most expensive resource was an RDS server running 24/7, but their working hours were only 8 AM – 5 PM. The compute for that server was costing them ~$330/month, so by deallocating it during non-working hours, they would save around $170/month.

Years ago, we had to do this with Azure Automations, but now with VM tasks, it is much simpler. If you don’t know where VM Tasks are, they can be found under the VM, and in the automation menu. The pre-defined tasks make it simple to schedule VMs to deallocate and start.

However, this post is not about VM tasks, but using them certainly opened up an interesting issue. After implementing the tasks to deallocate and then start the RDS server each day, users were unable to connect via RDP. This was a small environment without an RD Gateway. Users use Azure P2S VPN, and then RDP to the RDS server. The server would show online, report as healthy in RMM, and had network connectivity, but all RDP connections would fail with this error:

Oddly, rebooting the server would fix the issue. So, the first time the server would come out of a deallocated state each morning, they’d encounter this issue. But a warm reboot after the VM was started would resolve the issue. After all the basic troubleshooting (firewall, network connectivity, can ping a DC, etc.), we needed to move on to other potential causes.

The error indicates a certification authority, and we know that RDP connections rely on a certificate on the host device. That’s why, when you connect via RDP to a machine (assuming you have not installed a cert from a trusted authority, or you’re connecting using a name/ip that doesn’t match the CN on the cert), you will get this pop-up window:

The users were connecting by IP in this case, as they always have, but weren’t getting that pop-up message, so I suspected an issue with the certificate, but still wasn’t sure how this was being resolved by a warm reboot. I connected locally to the server while the issue was happening and did some poking around. Certificates for remote desktop connections on the host device are typically stored in the local cert store under Remote Desktop > Certificates:

There should be some self-signed certs that are generated by the Remote Desktop Service on the host device. These are what you should see if you click the view certificate button when connecting to a device through RDP (in the previous screenshot). We can check to see which certificate the host device is using with PowerShell:

Get-WmiObject -Namespace root\cimv2\TerminalServices -Class Win32_TSGeneralSetting -Filter "TerminalName='RDP-Tcp'" | Select SSLCertificateSHA1Hash

This gives us the certificate Thumbprint, which, in my case, did not match the certificate in the Remote Desktop Certificates container:

It seemed like we were onto something here, but I still wasn’t sure what that certificate was. After doing some more digging through the certificate stores, I located that certificate thumbprint in the personal container. This certificate was a server authentication cert using the CN of the server name, so it was the same format as the self-signed Remote Desktop Certificates that are generated. The important detail about this cert is that we can see that the certificate has expired (a long time ago):

After a warm restart on the server, the certificate in use would change to properly match the unexpired certificate in the remote desktop certificate store:

So, what is happening here? Well, I wasn’t exactly sure at first, and am still not 100% certain on this, but the most likely cause seems to be that the server was automatically defaulting to the first server authentication certificate it found, which happened to be the expired cert in the personal store. Why a warm reboot would resolve this and bind the RD Service to the correct certificate in the Remote Desktop Certificate store, I am still not certain about, but I suspect it has to do with the order services start and the delays that may occur during service startups between a cold boot and a warm boot.

Technically, if this registry value is set –

HKLM\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp\SSLCertificateSHA1Hash

– windows will automatically use the value’s thumbprint. In my case, and in several others I checked, this value is not set, so the RD Service will automatically select the certificate it thinks is correct. As much as I wanted to keep digging into the issue, this was such a fringe case that I had enough to go on and fix it. Plus, RDS is an old technology, and I always recommend moving from RDS to AVD wherever possible. The only RDS environments I am supporting or troubleshooting are those that I have inherited.

I don’t have 100% hard evidence to support my suspicions of a service startup race, but this seems like the most plausible cause. The fix, in this case, was to delete the expired cert from the personal store. Without the expired cert in the personal store, the correct (unexpired) cert was selected by the RD service, and connectivity functioned normally after the VM dellocation/start tasks.

This was an interesting issue I hadn’t encountered before, and also a good reminder that AVD or Windows 365 are better solutions with less moving parts that you, the IT admin, need to manage. To summarize, if you are getting the below error when trying to RDP to a device –

– check the certificates on your host device. In addition – if using a VPN and then using RDP over the VPN, make sure your DNS is configured correctly, and use the FQDN of the server when connecting through RDP, which should match the CN on it’s self signed cert and help prevent any warnings or errors.

Share on Social Media

Related Posts