Searching through webpage

python search text find webpage

36,631

Solution 1

You could do something simple like:


import urllib2
import re

html_content = urllib2.urlopen('http://www.domain.com').read()

matches = re.findall('regex of string to find', html_content);

if len(matches) == 0: 
   print 'I did not find anything'
else:
   print 'My string is in the html'

Solution 2

lxml is awesome: http://lxml.de/parsing.html

I use it regularly with xpath for extracting data from the html.

The other option is http://www.crummy.com/software/BeautifulSoup/ which is great as well.

36,631

Author by

AustinM

Updated on November 13, 2020

Comments

AustinM over 3 years

Hey I'm working on a Python project that requires I look through a webpage. I want to look through to find a specific text and if it finds the text, then it prints something out. If not, it prints out an error message. I've already tried with different modules such as libxml but I can't figure out how I would do it.

Could anybody lend some help?
snippsat over 13 years

Regex is not the right tool,when it comes to search/parse (x)html.stackoverflow.com/questions/1732348/…
dplouffe over 13 years

If you want to parse the DOM, sure I agree that regex is not the correct approach. That said, if you want to find a snippet of text on any text blob, I suggest using regular expressions. Whether the text is html or not it doesn't really matter if you're looking for a specific pattern.
Azurespot almost 4 years

@dplouffe this post is many years old, would you know if this is still the best option for Python?