Find and replace strings in Excel (.xlsx) using Python

23,143

I would copy the contents of your text file into a new worksheet in the excel file and name that sheet "Lookup." Then use text to columns to get the data in the first two columns of this new sheet starting in the first row.

Paste the following code into a module in Excel and run it:

Sub Replacer()
    Dim w1 As Worksheet
    Dim w2 As Worksheet

    'The sheet with the words from the text file:
    Set w1 = ThisWorkbook.Sheets("Lookup")
    'The sheet with all of the data:
    Set w2 = ThisWorkbook.Sheets("Data")

    For i = 1 To w1.Range("A1").CurrentRegion.Rows.Count
        w2.Cells.Replace What:=w1.Cells(i, 1), Replacement:=w1.Cells(i, 2), LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Next i

End Sub
Share:
23,143
antsemot
Author by

antsemot

Updated on October 11, 2020

Comments

  • antsemot
    antsemot over 3 years

    I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). I have a list of the strings to be searched and replaced in a file, formatted as below:-

    bird produk - bird product
    pig - pork
    ayam - chicken
    ...
    kuda - horse
    

    The word to be searched is on the left, and the replacement is on the right (find 'bird produk', replace with 'bird product'. My .xlsx sheet looks something like this:-

    name     type of animal     ID
    ali      pig                3483
    abu      kuda               3940
    ahmad    bird produk        0399
    ...
    ahchong  pig                2311
    

    I am looking for the fastest solution for this, since I have around 200 words in the list to be searched, and the .xlsx file is quite large. I need to use Python for this, but I am open to any other faster solutions.

    Edit:- added sheet example

    Edit2:- tried some python codes to read the cells, took quite a long time to read. Any pointers?

    from xlrd import open_workbook
    wb = open_workbook('test.xlsx')
    
    for s in wb.sheets():
        print ('Sheet:',s.name)
        for row in range(s.nrows):
            values = []
            for col in range(s.ncols):
                print(s.cell(row,col).value)
    

    Thank you!

    Edit3:- I finally figured it out. Both VBA module and Python codes work. I resorted to .csv instead to make things easier. Thank you! Here is my version of the Python code:-

    import csv
    
    ###### our dictionary with our key:values. ######
    reps = {
        'JUALAN (PRODUK SHJ)' : 'SALE( PRODUCT)',
        'PAMERAN' : 'EXHIBITION',
        'PEMBIAKAN' : 'BREEDING',
        'UNGGAS' : 'POULTRY'}
    
    
    def replace_all(text, dic):
        for i, j in reps.items():
            text = text.replace(i, j)
        return text
    
    with open('test.csv','r') as f:
        text=f.read()
        text=replace_all(text,reps)
    
    with open('file2.csv','w') as w:
        w.write(text)
    
  • antsemot
    antsemot over 9 years
    I've added an example of the sheet in my question, @laike9m
  • laike9m
    laike9m over 9 years
    @antsemot I see. Then you just need to iterate over all cell values as my first code snippet shows.
  • antsemot
    antsemot over 9 years
    i've copied the contents into a new sheet and used the text to columns to separate the data. Am now running the codes.
  • antsemot
    antsemot over 9 years
    I just finished running the codes, it seems that some of the strings are not replaced correctly. and there are some which are not even replaced at all. Please advise.
  • Mr. Mascaro
    Mr. Mascaro over 9 years
    If it's a problem with only some of them, I would guess that the text to columns was not done properly and there are extra spaces around the text in the lookup.
  • antsemot
    antsemot over 9 years
    okay thank you, I will look that up. By the way, if I want to search the whole cell (not just part of a string), I should change LookAt:=xlPart to LookAt:=xlWhole, correct?
  • Mr. Mascaro
    Mr. Mascaro over 9 years
    Yes, but that will make the problem with spaces worse. To get rid of the spaces you can use the TRIM(...) function and then copy and paste values. That should be the easiest way.
  • antsemot
    antsemot over 9 years
    I've removed the unnecessary spaces. I don't understand what you mean by "copy and paste values"
  • Mr. Mascaro
    Mr. Mascaro over 9 years
    Copy -> Paste Special -> Paste Values
  • antsemot
    antsemot over 9 years
    I tested the code with 5 rows (38 columns), it took so long (> 5 minutes). Is it normal?
  • antsemot
    antsemot over 9 years
    please refer to my post (edited). I've just tried to read the cells but it took a while so I'm guessing it would take longer to read, find and replace the cells.