Run smartctl on all disks of a server

21,264

So what I'm trying to do is automate these commands.

This already exist and manifests in smartd.

You normally need to configure your desired behaviour in /etc/smartd.conf

Example:

# DEVICESCAN: tells smartd to scan for all ATA and SCSI devices
# Alternative setting to report more useful raw temperature in syslog.
DEVICESCAN -I 194 -I 231 -I 9

You can alternatively put your disks explicitly like

/dev/sdc -d 3ware,0 -a -s L/../../7/01

If smartd discover an error, you'll get an email:

/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m [email protected]

There are also a number of other options and switches, you'll need to read the manpage of smartd.conf.

Share:
21,264

Related videos on Youtube

ZeroNegative
Author by

ZeroNegative

Updated on November 26, 2022

Comments

  • ZeroNegative
    ZeroNegative over 1 year

    My question is a quite simple , I want to run the command smartctl -i -A on all disks that the server have. Think that I've too much server with different number of disks and RAID Controllers, then I need to scan all drivers for a diagnosis. I'm thinking of running smartctl --scan | awk '{print $1}' >> test.log, so if I open the test.log I'll have all the drives information in it.
    After this I need to run some if or do constructions to scan with smartctl all drivers. I don't know if this is the best way to do this, since I need to identify the RAID Controller too. Am heading in the right direction?

    Edit:

    I'm used to use these commands to troubleshoot:

    Without RAID Controller

    for i in {c..d}; do
        echo "Disk sd$i" $SN $MD
        smartctl -i -A /dev/sd$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
    done
    

    PERC Controller

    for i in {0..12}; do
        echo "$i" $SN $MD
        smartctl -i -A -T permissive /dev/sda -d megaraid,$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
    done
    /usr/sbin/megastatus –physical
    /usr/sbin/megastatus --logical
    

    3ware Controller

    for i in {0..10}; do
        echo "Disk $i" $SN $MD
        smartctl -i -A /dev/twa0 -d 3ware,$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
    done
    

    SmartArray & Megaraid Controler:

    smartctl –a –d cciss,0 /dev/cciss/c0d0
    /opt/3ware/9500/tw_cli show
    cd /tmp
    

    DD (Rewrite disk block (DESTROY DATA)):

    dd if=/dev/zero of=/dev/HD* bs=4M
    HD*: sda, sdb…
    

    Burning (Stress test (DESTROY DATA)):

    /opt/systems/bin/vs-burnin --destructive --time=<hours> /tmp/burninlog.txt
    

    Dmesg&kernerrors:

    tail /var/log/kernerrors
    dmesg |grep –i –E “”ata”|”fault”|”error”
    

    So what I'm trying to do is automate these commands.
    I want that the script verify all disks that the host have and run the appropriate smartctl command for the case.
    Something like a menu with some options that let me choose if I want to run a smartctl or some destructive command, if I choose to run smartctl
    the script will scan all disks and runs the command according to the host configuration ( with / without RAID controller),
    and if I choose to run a destructive command, the script will ask me to put the disk number that I want to do this.


    Edit 2:

    I resolved my problem with the following script:

    #!/bin/bash
    # Troubleshoot.sh
    # A more elaborate version of Troubleshoot.sh.
    
    SUCCESS=0
    E_DB=99    # Error code for missing entry.
    
    declare -A address
    #       -A option declares associative array.
    
    
    
    if [ -f Troubleshoot.log ]
    then
        rm Troubleshoot.log
    fi
    
    if [ -f HDs.log ]
    then
        rm HDs.log
    fi
    
    smartctl --scan | awk '{print $1}' >> HDs.log
    lspci | grep -i raid >> HDs.log
    
    getArray ()
    {
        i=0
        while read line # Read a line
        do
            array[i]=$line # Put it into the array
            i=$(($i + 1))
        done < $1
    }
    
    getArray "HDs.log"
    
    
    for e in "${array[@]}"
    do
        if [[ $e =~ /dev/sd* || $e =~ /dev/hd* ]]
            then
                echo "smartctl -i -A $e" >> Troubleshoot.log
                smartctl -i -A $e >> Troubleshoot.log # Run smartctl into all disks that the host have
        fi
    done
    exit $?   # In this case, exit code = 99, since that is function return.
    

    I don't know if this solution is the right or the best one, but works for me!
    Appreciate all help!!

    • Admin
      Admin about 10 years
      Welcome on this site, your question is good, but to ask a question with a maximum chance of succes, you should describe precisely your goal, what you've tried so far, describe your problem if there's one and then ask your question. By applying this simple advise you could get much better answer.
    • Admin
      Admin about 10 years
      @Kiwy: I put some other information, see if it's good now :)
    • Admin
      Admin about 10 years
      @Ouki: I can't put to much details because of some policies of the company, all I can do is put the information that I've wrote on Edit.
    • Admin
      Admin about 10 years
      @ZeroNegative what you want basically is that someone develop a script for you. and you're very unlikely to find someone who will accomplish such a task on U&L or on any other Stack Exchange web site but your question is much more complete yes. Also take into account that I've edited 2 time your question please, try to analyse what I made so you can format correctly you question and answers next time ;-)
    • Admin
      Admin about 10 years
      @Kiwi sure, I'll! Well, I just need that someone give me some way to start this, I had thought of run smartctl --scan | awk '{print $1}' >> test.log to write a log with the disks path and want to do something to catch this path and put on the command, something like this: test.log have 2 lines, /dev/sda and /dev/sdb, then the script will pick this two paths and run smartctl -i -A /dev/sda && smartctl -i -A /dev/sdb. Am I looking for help in the wrong place? Appreciate the help!