Linux server became extremely slow

6,873

Solution 1

How are the disks setup in the server, I would suppose that you aren't using any RAID implementation to speed up the disk read/write speeds (as disk usage shows 33TB for the LVM group) and this is your problem.

As previous answers are correct, the 'wa' stands for disk I/O wait which is extremely high, practically locking up your server and placing any process that depends on disk I/O into uninterrupted sleep state (processes marked as "D" in top) - the problem is that you can not clear these processes by sending SIGKILL signal to them, they will not respond until they get what they wanted from the disk, and then they will exit.

Processes in D state can be cleared either by reducing your disk I/O wait which will clear the processes or reboot the server, and rebooting is the last thing you want to do, if you run into a need for a filesystem check (fsck) it would take a long time to go trough 33TB of data on slow disks, it even takes too long on RAID setups.

Look into getting a server with a quality RAID card and setup your disks in RAID6 if you're going with high number of disks per server, this way you'd get better speed and redundancy if you have a disk failure, since RAID6 can withstand two failed HDDs, where other RAID configuration can withstand 1 failed disk.

Also when you have the new server setup, look into RAID array health monitoring, einarc would be helpful to determine the RAID array and display if there are any problems with your RAID array. With the RAID setup you would be sacrificing a bit of disk space but you would get better read/write speeds compared to the current setup.

If you have 12 x 3TB drives and placing them into RAID 6 configuration, the formula to calulate the disk space you'd have is (N – 2) x (Smallest HDD size) , so on 12 3TB disks you would have 30TB of free disk space:

(12 - 2) x 3TB = 30TB

Also another suggestion, if you would be using this server just to serve files, I would suggest a switch to nginx that would prove itself beneficial in resource usage and the performance you would get out of it compared to Apache server which is good to a certain level. But this depends on your systems setup.

Solution 2

'wa' is for wait, which means that your CPUs are waiting for some I/O operations to complete. Most probably the problem is related to reading/writing files from/to disks. You should consider tunning up your system based on HDD I/O operations needs.

At some point you might want to distribute the HDD space on multiple machines (with lower CPU specs).

Another idea would be to use SSDs for caching (especially if you can predict the usage of some files over others). This would require extra development, but it might be the cheapest solution.

Share:
6,873
HTMHell
Author by

HTMHell

Updated on September 18, 2022

Comments

  • HTMHell
    HTMHell over 1 year

    I have a file sharing website, and my files hosted in a server with those system specifications:

    32GB RAM
    12x3TB
    2x Intel Quad Core E5620
    

    I have files in this server up to 4gb for each file. 446gb is full (/36TB)

    [root@hosted-by ~]# df -h
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda2              50G  2.7G   44G   6% /
    tmpfs                  16G     0   16G   0% /dev/shm
    /dev/sda1              97M   57M   36M  62% /boot
    /dev/mapper/VolGroup01-LogVol00
                           33T  494G   33T   2% /home
    

    And take a look at this:

    enter image description here

    Why is the wa% so high? (I think that what makes the server to be so slow)

    • Tometzky
      Tometzky over 10 years
      Most of your processes are in "D" state, which means that they're waiting for disk. Try to find out what file are they writing to or reading from. lsof -p PID could help.
    • sciurus
      sciurus over 10 years
      While wait is high what is the output of ` iostat -x 10 2` ? That will print i/o metrics since boot, gather information for 10 seconds, and then print those metrics.