'pskill \\hostname winlogon' might budge a server "stuck rebooting", but why?

18,096

What's happening is that the WinLogon service, the service that handles the logged in environment for RDP and local logins, is getting hung on shutdown for some reason. Could be some weird locking is going on that's preventing something critical from shutting down, or maybe the registry is still open somewhere. When you kill the winlogon service, it breaks the logjam so the reboot can move apace.

This will leave Event Log traces! In the Application Log, right before the reboot there will probably be some events describing why the machine couldn't go down just then.

The reason shutdown /m \\machine /r /f /t: 0 fails on you with that particular error message is that if the reboot has progressed past a certain point, it'll reject further reboot requests. The pskill method works because it isn't asking the machine to reboot, it's just knocking loose the process that's gumming up the reboot-in-progress.

Share:
18,096
Snoi
Author by

Snoi

Hosted Infrastructure Engineer

Updated on September 17, 2022

Comments

  • Snoi
    Snoi almost 2 years

    Question: Executing remote (Sysinternals) command... pskill \\machine winlogon ...can budge a server that is stuck rebooting, but how/why does this work? How do you know which service to kill?

    To recreate (e.g.): You run Windows Update, allow a reboot, and ...NOTHING! RDP gets cut off but the server does not reboot. Just about every other service seems to stay up.

    Further Background: I've faced this problem on VMs hosted around the planet for some years, and used various sc.exe and shutdown commands to learn the state of and attempt remote reboot of servers in such a state, with limited success. Most datacentres don't offer any way to see the true console or power off/on such machines. They charge $$ for you to call them to do such simple things after hours, when you nearly always have to run your maint tasks.

    e.g.

    NET USE \\machine\IPC$ /USER:login password

    sc \\machine query RpcSs

    sc \\machine query TermService

    sc \\machine query wuauserv

    tasklist /s machine

    This occasionally works for me...

    shutdown /m \\machine /r /f /t: 0

    ...but more often than not it fails with: A system shutdown is in progress (1115).

    I found this question, and the answer by @Tweek, and it worked really well, but was I just lucky?

    Can not RDP to Win 2003 box or initiate remote restart

    @Tweek said to run: pskill \\hostname winlogon

    ...and that got me past this situation in a new way (Server 2008 R2 in my most recent case) - really useful! I just need to understand if I got lucky or there is more science here. What I'd like to know is why the winlogon process?

    @Livne said to use "tasklist /s HostName" to see what is the culprit, but how do you tell from the listed output? It's just a list of running tasks etc. From that I would not know what to look for, nor could I see anything about the winlogon process that suggested to my eyes that was the one to kill.

    Added to question later: Event log entries found on the target machine, from before and after executing pskill winlogon (remotely)...

    Log Name: System Source: USER32 Date: 4/02/2011 4:09:51 a.m. Event ID: 1074 Task Category: None Level: Information Keywords: Classic User: sqlX\joeblogsblogs Computer: sqlX.example.org Description: The process Explorer.EXE has initiated the restart of computer sqlX on behalf of user sqlX\joeblogs for the following reason: Operating System: Recovery (Planned) Reason Code: 0x80020002 Shutdown Type: restart Comment:

    Log Name: System Source: USER32 Date: 4/02/2011 4:09:53 a.m. Event ID: 1074 Task Category: None Level: Information Keywords: Classic User: sqlX\joeblogs Computer: sqlX.example.org Description: The process C:\Windows\system32\winlogon.exe (sqlX) has initiated the restart of computer sqlX on behalf of user sqlX\joeblogs for the following reason: No title for this reason could be found Reason Code: 0x500ff Shutdown Type: restart Comment:

    Log Name: System Source: Service Control Manager Date: 4/02/2011 4:10:25 a.m. Event ID: 7043 Task Category: None Level: Error Keywords: Classic User: N/A Computer: sqlX.example.org Description: The Windows Update service did not shut down properly after receiving a preshutdown control.

    (then following services shut down...) Group Policy Client Shell Hardware Detection Application Experience (started) Application Experience (stopped)

    Log Name: System Source: Service Control Manager Date: 4/02/2011 5:09:50 a.m. Event ID: 7045 Task Category: None Level: Information Keywords: Classic User: sqlX\Administrator Computer: sqlX.example.org Description: A service was installed in the system. Service Name: PsKill Service File Name: %SystemRoot%\PSKLLSVC.EXE Service Type: user mode service Service Start Type: demand start Service Account: LocalSystem

    Log Name: System Source: Service Control Manager Date: 4/02/2011 5:09:51 a.m. Event ID: 7036 Task Category: None Level: Information Keywords: Classic User: N/A Computer: sqlX.example.org Description: The PsKill service entered the running state.

    Log Name: System Source: Service Control Manager Date: 4/02/2011 5:09:51 a.m. Event ID: 7036 Task Category: None Level: Information Keywords: Classic User: N/A Computer: sqlX.example.org Description: The PsKill service entered the stopped state.

    Log Name: System Source: Service Control Manager Date: 4/02/2011 5:09:52 a.m. Event ID: 7036 Task Category: None Level: Information Keywords: Classic User: N/A Computer: sqlX.example.org Description: The Application Experience service entered the running state.

    Log Name: System Source: Service Control Manager Date: 4/02/2011 5:10:26 a.m. Event ID: 7043 Task Category: None Level: Error Keywords: Classic User: N/A Computer: sqlX.example.org Description: The Windows Modules Installer service did not shut down properly after receiving a preshutdown control.

    (other stops...) 5:10:34 The Event log service was stopped. 5:10:33 DHCPv6 client service is stopped. ShutDown Flag value is 1 5:10:33 DHCPv4 client service is stopped. ShutDown Flag value is 1 5:10:33 The DHCP Client service entered the stopped state. 5:10:34 The Diagnostic Policy Service service entered the stopped state. 5:10:34 The Application Host Helper Service service entered the stopped state 5:10:34 The Windows Event Log service entered the stopped state. 5:10:35 The Cryptographic Services service entered the stopped state. 5:12:54 Microsoft (R) Windows (R) 6.01. 7600 Multiprocessor Free. 5:12:54 The Event log service was started. 5:12:54 The system uptime is 34 seconds.

    • Snoi
      Snoi over 13 years
      As a final note, just wanted to confirm that on the machines which I fixed using this last week, all failed to restart doing Windows Updates, and if you were to look at the console screen, they'd indicate that there. pskill has proved an essential tool in the armoury.
  • Rob
    Rob over 13 years
    I thought it was because killing winlogon caused the kernel to take the system down? (blogs.msdn.com/b/oldnewthing/archive/2008/10/13/8969404.asp‌​x)
  • user1686
    user1686 over 13 years
    @Rob: ...and if you want it to go down real fast, kill smss instead. Instant BSOD.
  • Snoi
    Snoi over 13 years
    Great help @sysadmin1138 and @grawity. I feel I have learned some really good knowledge here. Interesting how PSKill installs a service remotely (and did so without asking and without any issues).
  • Snoi
    Snoi over 13 years
    It is fairly clear from the reading you pointed me to and further thinking, that killing the winlogon process will force a reboot, just about whatever the state of the system. For reference, I will post the event log entries I found either side of using pskill, as an edit to my question above. It shows how cleanly the system got restarted, which is very reassuring :-)