Extracting IP address from a text and store it in a variable

47,274

Solution 1

You almost had it right the first time. The awk answer is good for your specific case, but the reason you were receiving an error is because you were trying to use grep as if it were searching for a file instead of a variable.

Also, when using regular expressions, I always use grep -E just to be safe. I have also heard that backticks are deprecated and should be replaced with $().

The correct way to grep a variable with on shells that support herestrings is using input redirection with 3 of these guys: <, so your grep command ($ip variable) should actually read as follows:

ip="$(grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' <<< "$line")"

If it is a file you are searching, I always use a while loop, since it is guaranteed to go line-by-line, whereas for loops often get thrown off if there is any weird spacing. You are also implementing a useless use of cat which could be replace by input redirection as well. Try this:

while read line; do
  ip="$(grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' <<< "$line")"
  echo "$ip"
done < "abd"

Also, I don't know what OS or version of grep you are using, but the escape character you had before the curly braces is usually not required whenever I have used this command in the past. It could be from using grep -E or because I use it in quotes and without backticks -- I don't know. You can try it with or without and just see what happens.

Whether you use a for loop or a while loop, that is based on which one works for you in your specific situation and if execution time is of utmost importance. It doesn't appear to me as if OP is trying to assign separate variables to each IP address, but that he wants to assign a variable to each IP address within the line so that he can use it within the loop itself -- in which case he only needs a single $ip variable per iteration. I'm sticking to my guns on this one.

Solution 2

If the IP address is always the second field of that file, you can use awk or cut to extract it.

awk '{print $2}' abd

or

cut -d' ' -f2 abd

If you need to iterate through the IP addresses, the usual for or while loops can be used. For example:

for ip in $(cut -d' ' -f2 abd) ; do ... ; done

or

awk '{print $2}' abd | while read ip ; do ... ; done

Or you can read all the IP addresses into an array:

$ IPAddresses=($(awk '{print $2}' abd))
$ echo "${IPAddresses[@]}"
128.206.6.136 128.206.6.137 23.234.22.106

Solution 3

grep searches files or standard input for the patterns. You cannot pass data strings to match on the grep command line. Try this:

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' abd

If you need to get each IP address in a variable:

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' abd |
while read IP
do
    echo "$IP"
done

Comparative Performance Testing of the accepted answer

The answer recommends executing a separate invocation of grep on each line of the input file. Let's see how that works out with files of 1000 to 5000 lines. The files abd.1000 and abd.5000 were created by simply replicating the original example file in the question. The original code was changed only to take the filename as a command line argument (${1:?}) instead of the hardcoded "abd".

$ wc -l abd.1000 abd.5000
  1000 abd.1000
  5000 abd.5000
  6000 total

Test the example code in this answer on a 1000 line file:

$ cat ip-example.sh
#!/bin/sh
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' "${1:?}" |
while read IP
do
    echo "$IP"
done

$ time sh ip-example.sh abd.1000 > /dev/null

real    0m0.021s
user    0m0.007s
sys     0m0.017s
$

The above shows that the example in this answer processed a 1000 line file in less than 1/4 second. Now let's see how the example in the accepted answer performs:

$ cat accepted.sh
#!/bin/bash
while read line; do
  ip="$(grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' <<< "$line")"
  echo "$ip"
done < "${1:?}"

$ time bash accepted.sh abd.1000 > /dev/null

real    0m3.565s
user    0m0.739s
sys     0m2.936s
$

Hmmm. The example in the accepted answer executes in 3 1/2 seconds, about 169 times slower than the 1/40 second in example for this answer.

Let's up the ante and test with 5000 lines:

$ time sh ip-example.sh abd.5000 > /dev/null

real    0m0.052s
user    0m0.051s
sys     0m0.029s

About twice as long to process 5 times more data.

$ time bash accepted.sh abd.5000 > /dev/null

real    0m17.561s
user    0m3.817s
sys     0m14.333s

The example code in the accepted answer takes almost 5 times as long to process 5 times more data than to process 1000 lines of data.

Conclusions

The example in the accepted answer takes 337 times longer to process a 5000 line file than the ip-example.sh code in this answer (the other answers on this page should perform similarly to ip-example.h).

Solution 4

I suggest you use AWK for that purpose. It's much more appropriate tool for processing columns.

xieerqi:$ vi ipAddresses

xieerqi:$ awk '{printf $2" "}' ipAddresses                                     
128.206.6.136 128.206.6.137 23.234.22.106 
xieerqi:$ ARRAY=($(awk '{printf $2" "}' ipAddresses))                          

xieerqi:$ echo ${ARRAY[@]}
128.206.6.136 128.206.6.137 23.234.22.106

xieerqi:$ echo ${ARRAY[1]} ${ARRAY[2]}
128.206.6.137 23.234.22.106

xieerqi:$ cat ipAddresses                                                      
48878 128.206.6.136
34782 128.206.6.137
12817 23.234.22.106

Solution 5

See the first question in the Bash FAQ:

while read -r _ ip; do printf "%s\n" "${ip[@]}"; done < abd
128.206.6.136
128.206.6.137
23.234.22.106
Share:
47,274

Related videos on Youtube

Swatesh Pakhare
Author by

Swatesh Pakhare

Updated on September 18, 2022

Comments

  • Swatesh Pakhare
    Swatesh Pakhare over 1 year

    I have a text file named abd shown below.

    48878 128.206.6.136
    34782 128.206.6.137
    12817 23.234.22.106
    

    I want to extract only IP address from the text and store it in a variable and use for other purpose.

    I have tried this.

    for line in `cat abd`
    do
    
    ip=`grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' $line`
    
    echo $ip
    
    done
    

    I am getting an error as follows

    grep: 34782: No such file or directory
    
    grep: 128.206.6.137: No such file or directory
    
    grep: 12817: No such file or directory
    
    grep: 23.234.22.106: No such file or directory
    

    I don't know what is going wrong here. Any help would be appreciated.

    • Admin
      Admin over 8 years
      Will the input file follow the same pattern?
    • Admin
      Admin over 8 years
      @heemayl Yes. There are loads of other IPs.
    • Admin
      Admin over 8 years
      Change the first line of your loop to while read line and add < abd after the done
    • Admin
      Admin over 8 years
      If there are tons of other IPs, then I think my answer best answers what it appeared as if you were actually trying to do, despite other users' negative votes and comments toward my answer. Can you clarify your question? Are you wanting to go through each IP in order and say something about it or do something with it, or are you going to reference each IP individually with a separate variable? If you are wanting to go in order (within the loop) you only need a single $ip variable per iteration, and there is no need for an array or to reference a specific IP address outside the loop.
  • Swatesh Pakhare
    Swatesh Pakhare over 8 years
    Yaa man it worked. Thank you. Appreciated.
  • rubynorails
    rubynorails over 8 years
    You can absolutely pass strings (and variables) to grep by using <<< input redirection.
  • Swatesh Pakhare
    Swatesh Pakhare over 8 years
    Can you explain me the second line of the code? What does that $ before grep means?
  • rubynorails
    rubynorails over 8 years
    @SwateshPakhare It is basically the same thing as the backticks. It sets the $ip variable to the output of the command inside $(). You could actually even say echo "$(grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' <<< "$line")" instead of setting it as a variable beforehand.
  • RobertL
    RobertL over 8 years
    <<< is a shell extension and won't work under many shells, for example Debian based systems, unless the script is run by bash or zsh or etc. The default system shell on these systems is POSIX compliant and does not recognize <<<.
  • RobertL
    RobertL over 8 years
    The loop in this answer executes a separate grep process for each line of the input file. Even with files of moderate size this loop will take seconds, instead of fractions of seconds, to execute. The larger the file the bigger the performance hit.
  • Rui F Ribeiro
    Rui F Ribeiro over 8 years
    I second the awk, seems much more intuitive in Unix
  • rubynorails
    rubynorails over 8 years
    @RobertL No, it's actually the only way unless you echo the string or variable and then pipe it to grep. So with it being the only way to grep it directly, then I would say that would, at least by most people, be considered the correct way. And by the way, I use <<< on Debian-based systems all the time. Most default shells these days are Bash.
  • rubynorails
    rubynorails over 8 years
    @RobertL I meant it was the only way to grep a string or variable, not the only way to solve the problem. You specifically stated that grep had to be executed on a file, which is absolutely false. Also, nowadays, /bin/sh is usually just a symlink to Bash in order to maintain compatibility since Bash can do every SH can do, plus so much more. There is no reason to run SH. It has been updated, outdated, and deprecated for years. If someone has a question about SH, they should specify that, and I will tailor my answers to fit their needs. Otherwise, my answers will default to Bash.
  • rubynorails
    rubynorails over 8 years
    I feel like I should also comment on this answer since it has been updated to call me out. Let me once more state, that I never said to know the "true way of Unix" or that "my answer was the only correct way." @RobertL is twisting my words because I called him out on the fact that it is actually possible to grep a variable. I already said the awk solution posed by cas is the best answer to assign a different variable to each IP address. The only thing I defined as "correct" and "the only way" are the 2 different ways of grepping a string or variable, which @RobertL said was not possible.
  • mikeserv
    mikeserv over 8 years
    though i think it is pretty funny, pointing out the hurt butts probably isn't making you a lot of friends, either.
  • terdon
    terdon over 8 years
    Don't use answers to take potshots at another user. If you must, take it to chat.
  • terdon
    terdon over 8 years
    Don't use answers to take potshots at another user. If you must, take it to chat. @RobertL follow your own advice. Keep your answers technical and be nice.
  • Alessio
    Alessio over 8 years
    bash behaves differently when called as /bin/sh. It runs in --posix mode and disables all non-POSIX extensions.
  • Alessio
    Alessio over 6 years
    Everything you say about the relative performance is correct, except that it's at least 10 times worse than you say (and 0.021 seconds is not only less than 1/4 second, it's less than 1/40th of a second). The first example shows that running the grep in the loop is 169 times slower, not 7-8 times (3.565 / 0.21 = 169.76), while the second shows that the loop is 337 times slower, not 34 times (17.561 / 0.052 = 337.71)