Citrix degrades with 100% CPU

8,907

Solution 1

Windows 2003 SP1 went out of support in April, so your OS does not get any security patches anymore. You need to upgrade to SP2 ASAP.

SP2 also has lots of random bug fixes in it - your issue could go away.

If your OS has that old a patch level, there is a good chance some drivers - specifically print drivers - could be out of date on the box too. As drivers are a big source of system instability in general, I would try checking they are all signed and up to date. Having a dodgy print driver would explain why it affects both virtual and physical boxes, and appears to occur randomly regardless of load.

Oh and FYI Citrix 4 goes EOM (End of Maintenance, no more bug fixes) at the end of this month June 09, and EOL (End of Life, no more security patches or any other patches) at the end of Dec 09. Enjoy your upgrade cycle!

Solution 2

You can try scheduling a script to run every minute or so that appends the process list to a file:

pslist >> whatever.txt

Something like this might at least give you a clue as to what's going on.

(pslist comes with the Sysinternals Suite)

Solution 3

The built-in Performance Logs and Alerts tool would be a great tool to get you some data about what's going on. You're going to have to use some disk space to generate these logs, but if you stay on top of deleting old log files until the problem occurs you shouldn't have a problem w/ running out of disk.

I'd start up a counter log on each server computer, logging the Process and Processor objects to disk (I'd probably also grab the Memory object, too).

  • Start / Run / PERFMON

  • Expand the Performance Logs and Alerts node and highlight the Counter Logs node.

  • Click Action and New Log Settings. Name the log however you'd like.

  • Click the Add Objects... button in the log properites window and add the objects to log.

  • Set an interval. I'd probably choose a 60 second or longer interval. High resolution probably isn't necessary since this is a gradual degredation.

  • On the Log Files tab, use the Configure button to choose a location for the log file and a base filename. I'd choose a Maximum log size of, say, 5MB - 10MB. This is going to generate a lot of small files, but you will be able to monitor the path where you're storing the files and delete older files that are piling up prior to the problem occurring.

You can start the log by right-clicking the new log instance in the results pane and choosing "Start". The log will run, by default, until you stop it or until you reboot the computer. (See this question for information about starting a log on boot: How to Setup Perfmon to Automaticaly Start an "Alert" At System Startup? (The question talks about starting an alert, but you can use the same command to start a log.)

You can analyze these logs by hand after the issue occurs. You might want to try Microsoft's Performance Analysis of Logs (PAL) tool (http://www.codeplex.com/PAL). I've been happy with the reports that tool has generated, and it's fairly easy to use.

Solution 4

Try to add an extra virtual CPU to the servers IF they only have one vCPU. If it's a singlethreaded application eating up all the CPU you'll atleast get in to kill it instead of reseting the server.

Share:
8,907

Related videos on Youtube

Kevin Kuphal
Author by

Kevin Kuphal

Updated on September 17, 2022

Comments

  • Kevin Kuphal
    Kevin Kuphal over 1 year

    We have a Citrix PS4.0 farm made up of 2 physical and 2 virtual Citrix servers. Any one of them at some point or another will eventually degrade in performance due to hitting 100% CPU usage. I can see the CPU usage spike in the Virtual Infrastructure Client when this happens on either of the VMware servers.

    This is not a load issue related to the number of users as it can happen at any time with any number of users.

    Users are running shared desktops, not applications. Installed applications in the desktop are standard office application (Word, Excel, Outlook) with limited Internet Explorer access through a Bluecoat Proxy and a couple industry-specific applications.

    What tools can be used to troubleshoot and diagnose the source of the problem? Once the server hits 100% CPU, it is impossible to log onto and see what process is consuming all the resources. The only recourse is to hard reset the machine. All servers restart at 4am each morning on a schedule.

    NOTE: I already have ThreadMaster installed on all Citrix servers using the default configuration options and logging activities. The logs do not reveal the source of the problem.

    EDIT

    • Citrix Presentation Server 4.0, Enterprise Edition
    • Hotfix PSE400W2K3R03
    • Windows 2003 Server Standard Edition Service Pack 1
    • Runs Symantec Client Security 10.0.0.359 configured per the recommendations from Citrix for file exclusions, etc.
    • Admin
      Admin almost 15 years
      Are you running with EdgeSight installed and configured, which can provide some additional details?
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    Unfortunately, as soon as the server hits 100% CPU you cannot view your session so this method is not usable.
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    This is fairly brute force. I was hoping for something more elegant. I'd be afraid also of that process filling the drive since sometimes this occurrence doesn't happen for some time.
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    Editing the original question to include these details. We do have an SA agreement and are currently in the process of building a new farm with XenApp 5 but this continues to be a nagging issue for our current farm.
  • Ben Kohn
    Ben Kohn almost 15 years
    If you have the ability, I would strongly recommend testing SP2 of W2k3, a lot of improvements to TS and general OS stability were included in that release. There are also some post SP2 hotfixes directly related to TS+AV are available. What does the console look like when this happens? Can you see the SAS screen? Also, if you're current on your SA and have the Enterprise edition, you might consider setting up EdgeSight, it's not that difficult and you'd get all the data you'd need to troubleshoot this further. And then some.
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    The console is unresponsive...well, I can enter my password to log on but never get to the desktop. Thanks for the EdgeSight suggestion. I'll take a look at that.
  • Steven
    Steven almost 15 years
    Its still the best way. I regularly see one of our TS stop responding because of run away processes - the program hits a bug and uses 100% cpu. If its only one user I can usually get in and kill it. Sometimes though it happens to several people and a reboot is needed. Its easy enough to have a script rotate the log to stop the disk from filling up.
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    Will this show process names? The issue isn't a gradual degradation, but a sudden spike to 100% and then it's too late to see what's going on inside the box.
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    How do you get pslist to generate a list that shows like Task Manager which process is using 100% of the CPU at a given moment? It will show this in "task manager" mode but not from a straight pslist command.
  • Spence
    Spence almost 15 years
    Gradual was the wrong word to use. How about "intermittent". Process names will absolutely be listed. As long as there's enough CPU for the Performance Logs and Alerts service to flush its logs to disk, you'll get info about the process that's going haywire. Fire up a copy of PERFMON on an XP or W2K3 machine, click the "+" in the toolbar, choose "Process" in the "Performance Object" list-box, and have a look at the counters that can be logged. Those counters will be logged for each process (and any new processes) during the log collection period. It's a very, very nice tool.
  • Spence
    Spence almost 15 years
    Since you're restarting the servers each day, you'll need to throw in a scheduled task to restart the performance log on startup. The question I linked above explains how to do that.
  • Neobyte
    Neobyte almost 15 years
    Bear in mind that if you approach Citrix the very first thing they will say is "go install SP2 and come back when it's done and the problem still exists"... We had a random issue with an external DNS server a few months ago. The answer? Install SP2 and a random update fixed it, despite that problem not being listed in the issues SP2 fixed.
  • Palindrom
    Palindrom almost 15 years
    Unfortunately, pslist only shows the CPU time. I haven't found a tool yet that will show the CPU % (like in task manager).
  • Kevin Kuphal
    Kevin Kuphal almost 15 years
    I am in the process of upgrading to SP2 tonight. Should also fix a STOP error we've been encountering that is listed in the KB as fixed in SP2.