Extract HTML tag data with sed

14,682

Solution 1

Give this a try:

sed -n 's|[^<]*<i>\([^<]*\)</i>[^<]*|\1\n|gp'

And your example is missing a "/":

Hello, <i>I</i> am <i>very</i> glad to meet you.

Solution 2

Try this:

$ sed 's/<[^>]*>//g' file.html
Share:
14,682
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I wish to extract data between known HTML tags. For example:

    Hello, <i>I<i> am <i>very</i> glad to meet you.
    

    Should become:

    'I
    
    very'
    

    So I have found something that works to nearly do this. Unfortunately, it only extracts the last entry.

    sed -n -e 's/.*<i>\(.*\)<\/i>.*/\1/p'
    

    Now I can append any end tag </i> with a newline character and this works fine. But is there a way to do it with just one sed command?