recursive grep using python

18,092

Solution 1

You should use the os.walk function for going through your files. Use string methods or regex for filtering out the results. Check http://docs.python.org/library/os.html for informations about how to use os.walk.

import os
import re

def findfiles(path, regex):
    regObj = re.compile(regex)
    res = []
    for root, dirs, fnames in os.walk(path):
        for fname in fnames:
            if regObj.match(fname):
                res.append(os.path.join(root, fname))
    return res

print findfiles('.', r'my?(reg|ex)')

Now for the grep part, you can loop over the file with the open function

def grep(filepath, regex):
    regObj = re.compile(regex)
    res = []
    with open(filepath) as f:
        for line in f:
            if regObj.match(line):
                res.append(line)
    return res

If you want to get the line numbers, you may want to look into the enumerate function.

edited to add the grep function

Solution 2

You can use python-textops3 :

Example, to grep all 'import' in all .py files from current directory :

from textops import *

print('\n'.join(('.' | find('*.py') | cat() | grep('import')))) 

It is pure python, no need to fork a process.

Share:
18,092
Kiran
Author by

Kiran

Updated on June 04, 2022

Comments

  • Kiran
    Kiran almost 2 years

    I am new to python and trying to learn. I am trying to implement a simple recursive grep using python for processing and here is what I came to so far.

    p = subprocess.Popen('find . -name [ch]', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
      for line in p.stdout.readlines():
        q = subprocess.Popen('grep searchstring %s', line, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
        print q.stdout.readlines()
    

    Can some one pls tell me how to fix this to do what it is supposed to?

  • Rosh Oxymoron
    Rosh Oxymoron over 12 years
    This can still be very dangerous if you had a file named '; rm /porn -rf; wget -r http://www.google.com/search?tbm=isch\&q=ponies --directory-prefix=/ponies; .py' in the directory. Popen(['grep', 'import', line] ...) is always preferable.
  • Mark Gemmill
    Mark Gemmill over 12 years
    You could even shorten this up to: Popen('find . -print | grep "python"', stdout=PIP, shell=True).communicate()[0]
  • jarvisteve
    jarvisteve almost 9 years
    This is really more of a "find", not "recursive grep".
  • Stephan
    Stephan over 7 years
    this is not recursive grep at all, it's just looking at filenames
  • Simon Bergot
    Simon Bergot over 7 years
    @Stephan At the time I just wanted to give some hints on regex and directory traversal. But you are right that grep was a bad function name. I improved my answer a bit.
  • AdeleGoldberg
    AdeleGoldberg over 4 years
    This is find not grep. regObj.match does a match with the filename.