Find and replace strings in Excel (.xlsx) using Python
I would copy the contents of your text file into a new worksheet in the excel file and name that sheet "Lookup." Then use text to columns to get the data in the first two columns of this new sheet starting in the first row.
Paste the following code into a module in Excel and run it:
Sub Replacer()
Dim w1 As Worksheet
Dim w2 As Worksheet
'The sheet with the words from the text file:
Set w1 = ThisWorkbook.Sheets("Lookup")
'The sheet with all of the data:
Set w2 = ThisWorkbook.Sheets("Data")
For i = 1 To w1.Range("A1").CurrentRegion.Rows.Count
w2.Cells.Replace What:=w1.Cells(i, 1), Replacement:=w1.Cells(i, 2), LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next i
End Sub
antsemot
Updated on October 11, 2020Comments
-
antsemot over 3 years
I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). I have a list of the strings to be searched and replaced in a file, formatted as below:-
bird produk - bird product pig - pork ayam - chicken ... kuda - horse
The word to be searched is on the left, and the replacement is on the right (find 'bird produk', replace with 'bird product'. My .xlsx sheet looks something like this:-
name type of animal ID ali pig 3483 abu kuda 3940 ahmad bird produk 0399 ... ahchong pig 2311
I am looking for the fastest solution for this, since I have around 200 words in the list to be searched, and the .xlsx file is quite large. I need to use Python for this, but I am open to any other faster solutions.
Edit:- added sheet example
Edit2:- tried some python codes to read the cells, took quite a long time to read. Any pointers?
from xlrd import open_workbook wb = open_workbook('test.xlsx') for s in wb.sheets(): print ('Sheet:',s.name) for row in range(s.nrows): values = [] for col in range(s.ncols): print(s.cell(row,col).value)
Thank you!
Edit3:- I finally figured it out. Both VBA module and Python codes work. I resorted to .csv instead to make things easier. Thank you! Here is my version of the Python code:-
import csv ###### our dictionary with our key:values. ###### reps = { 'JUALAN (PRODUK SHJ)' : 'SALE( PRODUCT)', 'PAMERAN' : 'EXHIBITION', 'PEMBIAKAN' : 'BREEDING', 'UNGGAS' : 'POULTRY'} def replace_all(text, dic): for i, j in reps.items(): text = text.replace(i, j) return text with open('test.csv','r') as f: text=f.read() text=replace_all(text,reps) with open('file2.csv','w') as w: w.write(text)
-
antsemot over 9 yearsI've added an example of the sheet in my question, @laike9m
-
laike9m over 9 years@antsemot I see. Then you just need to iterate over all cell values as my first code snippet shows.
-
antsemot over 9 yearsi've copied the contents into a new sheet and used the text to columns to separate the data. Am now running the codes.
-
antsemot over 9 yearsI just finished running the codes, it seems that some of the strings are not replaced correctly. and there are some which are not even replaced at all. Please advise.
-
Mr. Mascaro over 9 yearsIf it's a problem with only some of them, I would guess that the text to columns was not done properly and there are extra spaces around the text in the lookup.
-
antsemot over 9 yearsokay thank you, I will look that up. By the way, if I want to search the whole cell (not just part of a string), I should change LookAt:=xlPart to LookAt:=xlWhole, correct?
-
Mr. Mascaro over 9 yearsYes, but that will make the problem with spaces worse. To get rid of the spaces you can use the
TRIM(...)
function and then copy and paste values. That should be the easiest way. -
antsemot over 9 yearsI've removed the unnecessary spaces. I don't understand what you mean by "copy and paste values"
-
Mr. Mascaro over 9 yearsCopy -> Paste Special -> Paste Values
-
antsemot over 9 yearsI tested the code with 5 rows (38 columns), it took so long (> 5 minutes). Is it normal?
-
antsemot over 9 yearsplease refer to my post (edited). I've just tried to read the cells but it took a while so I'm guessing it would take longer to read, find and replace the cells.