How can I grep data in Excel?
Solution 1
I found a nifty util written by a Polish guy in 2013. Since then, the program has moved to github. The Makefile was removed (github version), and instead the current maintainer is using CMake by the looks of it.
However, we are savvy superusers, so we know how to compile stuff:
$ link=https://github.com/vobys/xls2txt/archive/refs/heads/master.tar.gz
$ wget -nv "$link" \
&& tar xf master.tar.gz \
&& cd xls2txt-master \
&& gcc -O2 -lm cp.c ieee754.c ole.c ummap.c xls2txt.c -o xls2txt
Extract the data. (Since you said grep, I assume you know how to use a shell)
$ awkf() { awk -F\\t '{ printf "%20s | %-20s\n", $1, $2 }' ; }
$ x2t() { ./xls2txt $1 C4:D5 | awkf > `basename "$1" .xls`.txt ; }
$ x2t ramesh2.xls ; x2t ramesh.xls
And the report:
$ grep MXS1268 ramesh*txt
ramesh.txt: Operator ID | MXS1268
ramesh2.txt: Operator ID | MXS1268
For identification purposes:
$ ./xls2txt ; echo; sed -n '/Copy/{ s/<.*//; p ; q; }' < xls2txt.c
usage: xls2txt [-C cs] [-n sheetnum|-A] [-f] file.xls [X:X]
xls2txt [-C cs] -l file.xls
X:X cell range (eg. A1:C5, D2:E)
-l list sheets
-n num select sheet
-A all sheets (\f separated)
-C cs output charset (utf8 asc iso1 iso2), utf8 is default
-f don't try to format numbers
-a ascii output (same as -C asc)
* Copyright (C) 2005-2011 Jan Bobrowski
Update 2022-05: It looks like original author does not want to host this anymore, because the cert expired in November 2021 and the old URL produces HTTP 403.
original (dead) link http://wizard.ae.krakow.pl/~jb/xls2txt/xls2txt-0.13.tar.gz
Lucky for us, there is a copy at github (link to master.tar.gz above).
Old instructions
$ wget -nv "$link" && \
tar xf `basename "$link"` && \
cd $(basename "$link" .tar.gz) &&\
make
2013-07-24 URL:(...)/xls2txt-0.13.tar.gz [12419/12419] -> "xls2txt-0.13.tar.gz"
cc -O2 -g -DVERSION=0.13 -c xls2txt.c -o xls2txt.o
cc -O2 -g -c -o ole.o ole.c
cc -O2 -g -c -o cp.o cp.c
cc -O2 -g -c -o ummap.o ummap.c
cc -O2 -g -c -o ieee754.o ieee754.c
cc -lm xls2txt.o ole.o cp.o ummap.o ieee754.o -o xls2txt
Solution 2
On Linux or Cygwin:
$ xlsx2csv myfile.xlsx | grep MXS1268
Solution 3
You might try using openpyxl --or any number of Python packages-- if you know basic Python. If you're dead set on using grep, output the excel to text and pipe it through grep. Or just write a pygrep script to do it all.
munish
Updated on September 18, 2022Comments
-
munish over 1 year
I have an Excel file that contains data in two columns like :
Operator ID MXS1268 Name ramesh
I need to check for duplicates in multiple Excel files which have the same value for the operator ID and display or put it in some log file.
-
Admin almost 11 yearsDepending on what you want to do, there are different ways of doing them in Excel. If you want to search, then can do so via Ctrl+F bringing up the Find dialog. If there are multiple criteria, perhaps filters will help you out. For removing duplicates there are a number of ways. You'll have to be a bit more clear in what you require. Most likely though if you search this site, you'll find what you need as your requirements seem to be simple enough.
-
Admin almost 11 yearsAre all the Excel files in the same two-column format? Are the files
.xlsx
or.csv
? Do the files have multiple sheets? Adding more information to your question will help you get answers.
-
-
Ярослав Рахматуллин over 10 yearsYou can get a copy on the input file with
wget http://sprunge.us/MbhE -q -O- | base64.exe -d > ram.xls
. -
Ken Ingram about 5 yearsDamn. That was too good to be true. I have it installed, but an error is all I got back: ===> "ValueError: invalid literal for int() with base 10: '1023 1025'"
-
SuperSafie about 5 yearsTry both with python2 and python3
-
Ken Ingram about 5 yearsI want to try out your solution but your link is "Forbidden"
-
Ken Ingram about 5 yearsEh. What do you mean? Try both what?
-
Ken Ingram about 5 yearsI have a list of 10 xlsx files and I want to grep for one piece of data.
-
Ярослав Рахматуллин about 5 yearsjust look at the file before you run it, don't be silly.
-
Ken Ingram about 5 yearsListen. Don't get personal. I get a 403 error with that link. "2019-02-08 09:38:13 ERROR 403: Forbidden."
-
Ярослав Рахматуллин about 5 yearsok. i thought you meant something else. just google the package xls2txt
-
Admin almost 2 yearsHow is this better than accepted answer?
-
Admin almost 2 yearsDoes it have to be better? I gave an answer that I know works and it looks far easier than the rest.
-
Admin almost 2 yearsFor example, you don't explain how you get the
.csv
file.