Searching through webpage

36,631

Solution 1

You could do something simple like:


import urllib2
import re

html_content = urllib2.urlopen('http://www.domain.com').read()

matches = re.findall('regex of string to find', html_content);

if len(matches) == 0: 
   print 'I did not find anything'
else:
   print 'My string is in the html'

Solution 2

lxml is awesome: http://lxml.de/parsing.html

I use it regularly with xpath for extracting data from the html.

The other option is http://www.crummy.com/software/BeautifulSoup/ which is great as well.

Share:
36,631
AustinM
Author by

AustinM

Updated on November 13, 2020

Comments

  • AustinM
    AustinM over 3 years

    Hey I'm working on a Python project that requires I look through a webpage. I want to look through to find a specific text and if it finds the text, then it prints something out. If not, it prints out an error message. I've already tried with different modules such as libxml but I can't figure out how I would do it.

    Could anybody lend some help?

  • snippsat
    snippsat over 13 years
    Regex is not the right tool,when it comes to search/parse (x)html.stackoverflow.com/questions/1732348/…
  • dplouffe
    dplouffe over 13 years
    If you want to parse the DOM, sure I agree that regex is not the correct approach. That said, if you want to find a snippet of text on any text blob, I suggest using regular expressions. Whether the text is html or not it doesn't really matter if you're looking for a specific pattern.
  • Azurespot
    Azurespot almost 4 years
    @dplouffe this post is many years old, would you know if this is still the best option for Python?