Replace strings in files by Python, recursively in given directory and its subdirectories?

25,551

Solution 1

Put all this code into a file called mass_replace. Under Linux or Mac OS X, you can do chmod +x mass_replace and then just run this. Under Windows, you can run it with python mass_replace followed by the appropriate arguments.

#!/usr/bin/python

import os
import re
import sys

# list of extensions to replace
DEFAULT_REPLACE_EXTENSIONS = None
# example: uncomment next line to only replace *.c, *.h, and/or *.txt
# DEFAULT_REPLACE_EXTENSIONS = (".c", ".h", ".txt")

def try_to_replace(fname, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
    if replace_extensions:
        return fname.lower().endswith(replace_extensions)
    return True


def file_replace(fname, pat, s_after):
    # first, see if the pattern is even in the file.
    with open(fname) as f:
        if not any(re.search(pat, line) for line in f):
            return # pattern does not occur in file so we are done.

    # pattern is in the file, so perform replace operation.
    with open(fname) as f:
        out_fname = fname + ".tmp"
        out = open(out_fname, "w")
        for line in f:
            out.write(re.sub(pat, s_after, line))
        out.close()
        os.rename(out_fname, fname)


def mass_replace(dir_name, s_before, s_after, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
    pat = re.compile(s_before)
    for dirpath, dirnames, filenames in os.walk(dir_name):
        for fname in filenames:
            if try_to_replace(fname, replace_extensions):
                fullname = os.path.join(dirpath, fname)
                file_replace(fullname, pat, s_after)

if len(sys.argv) != 4:
    u = "Usage: mass_replace <dir_name> <string_before> <string_after>\n"
    sys.stderr.write(u)
    sys.exit(1)

mass_replace(sys.argv[1], sys.argv[2], sys.argv[3])

EDIT: I have changed the above code from the original answer. There are several changes. First, mass_replace() now calls re.compile() to pre-compile the search pattern; second, to check what extension the file has, we now pass in a tuple of file extensions to .endswith() rather than calling .endswith() three times; third, it now uses the with statement available in recent versions of Python; and finally, file_replace() now checks to see if the pattern is found within the file, and doesn't rewrite the file if the pattern is not found. (The old version would rewrite every file, changing the timestamps even if the output file was identical to the input file; this was inelegant.)

EDIT: I changed this to default to replacing every file, but with one line you can edit to limit it to particular extensions. I think replacing every file is a more useful out-of-the-box default. This could be extended with a list of extensions or filenames not to touch, options to make it case insensitive, etc.

EDIT: In a comment, @asciimo pointed out a bug. I edited this to fix the bug. str.endswith() is documented to accept a tuple of strings to try, but not a list. Fixed. Also, I made a couple of the functions accept an optional argument to let you pass in a tuple of extensions; it should be pretty easy to modify this to accept a command-line argument to specify which extensions.

Solution 2

Do you really need regular expressions?

import os

def recursive_replace( root, pattern, replace )
    for dir, subdirs, names in os.walk( root ):
        for name in names:
            path = os.path.join( dir, name )
            text = open( path ).read()
            if pattern in text:
                open( path, 'w' ).write( text.replace( pattern, replace ) )

Solution 3

Of course, if you just want to get it done without coding it up, use find and xargs:

find /home/noa/Desktop/codes -type f -print0 | \
xargs -0 sed --in-place "s/dbname=noa user=noa/dbname=masi user=masi"

(And you could likely do this with find's -exec or something as well, but I prefer xargs.)

Solution 4

This is how I would find and replace strings in files using python. This is a simple little function that will recursively search a directories for a string and replace it with a string. You can also limit files with a certain file extension like the example below.

import os, fnmatch
def findReplace(directory, find, replace, filePattern):
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        for filename in fnmatch.filter(files, filePattern):
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace)
            with open(filepath, "w") as f:
                f.write(s)

This allows you to do something like:

findReplace("some_dir", "find this", "replace with this", "*.txt")

Solution 5

this should work:

import re, os
import fnmatch
for path, dirs, files in os.walk(os.path.abspath(directory)):
       for filename in fnmatch.filter(files, filePattern):
           filepath = os.path.join(path, filename)
           with open("namelist.wps", 'a') as out:
               with open("namelist.wps", 'r') as readf:
                   for line in readf:
                       line = re.sub(r"dbname=noa user=noa", "dbname=masi user=masi", line)
                       out.write(line)
Share:
25,551
Michal aka Miki
Author by

Michal aka Miki

Vacare.

Updated on January 20, 2022

Comments

  • Michal aka Miki
    Michal aka Miki over 2 years

    How can you replace a string match inside a file with the given replacement, recursively, inside a given directory and its subdirectories?

    Pseudo-code:

    import os
    import re
    from os.path import walk
    for root, dirs, files in os.walk("/home/noa/Desktop/codes"):
            for name in dirs:
                    re.search("dbname=noa user=noa", "dbname=masi user=masi")
                       // I am trying to replace here a given match in a file