How can I grep data in Excel?

33,991

Solution 1

I found a nifty util written by a Polish guy in 2013. Since then, the program has moved to github. The Makefile was removed (github version), and instead the current maintainer is using CMake by the looks of it.

However, we are savvy superusers, so we know how to compile stuff:

$ link=https://github.com/vobys/xls2txt/archive/refs/heads/master.tar.gz 
$ wget -nv "$link" \
    && tar xf master.tar.gz \
    && cd xls2txt-master \
    && gcc -O2 -lm cp.c ieee754.c ole.c ummap.c xls2txt.c -o xls2txt

Extract the data. (Since you said grep, I assume you know how to use a shell)

$ awkf() { awk -F\\t '{ printf "%20s | %-20s\n", $1, $2 }' ; }
$ x2t() { ./xls2txt $1 C4:D5 | awkf > `basename "$1" .xls`.txt ; }
$ x2t ramesh2.xls ; x2t ramesh.xls

And the report:

$ grep MXS1268 ramesh*txt
ramesh.txt:         Operator ID | MXS1268
ramesh2.txt:         Operator ID | MXS1268

For identification purposes:

$ ./xls2txt ; echo; sed -n '/Copy/{ s/<.*//; p ; q;  }' < xls2txt.c 

usage: xls2txt [-C cs] [-n sheetnum|-A] [-f] file.xls [X:X]
       xls2txt [-C cs] -l file.xls
 X:X    cell range (eg. A1:C5, D2:E)
 -l list sheets
 -n num select sheet
 -A all sheets (\f separated)
 -C cs  output charset (utf8 asc iso1 iso2), utf8 is default
 -f don't try to format numbers
 -a ascii output (same as -C asc)

 *  Copyright (C) 2005-2011 Jan Bobrowski 

Update 2022-05: It looks like original author does not want to host this anymore, because the cert expired in November 2021 and the old URL produces HTTP 403.

original (dead) link http://wizard.ae.krakow.pl/~jb/xls2txt/xls2txt-0.13.tar.gz

Lucky for us, there is a copy at github (link to master.tar.gz above).

Old instructions

$ wget -nv "$link" && \
  tar xf `basename "$link"` && \
  cd $(basename "$link" .tar.gz) &&\
   make
2013-07-24 URL:(...)/xls2txt-0.13.tar.gz [12419/12419] -> "xls2txt-0.13.tar.gz"
cc -O2 -g -DVERSION=0.13 -c xls2txt.c -o xls2txt.o
cc -O2 -g   -c -o ole.o ole.c
cc -O2 -g   -c -o cp.o cp.c
cc -O2 -g   -c -o ummap.o ummap.c
cc -O2 -g   -c -o ieee754.o ieee754.c
cc -lm  xls2txt.o ole.o cp.o ummap.o ieee754.o   -o xls2txt

Solution 2

On Linux or Cygwin:

$ xlsx2csv myfile.xlsx | grep MXS1268 

Solution 3

You might try using openpyxl --or any number of Python packages-- if you know basic Python. If you're dead set on using grep, output the excel to text and pipe it through grep. Or just write a pygrep script to do it all.

Share:
33,991
munish
Author by

munish

Updated on September 18, 2022

Comments

  • munish
    munish over 1 year

    I have an Excel file that contains data in two columns like :

    Operator ID           MXS1268
    Name                  ramesh    
    

    I need to check for duplicates in multiple Excel files which have the same value for the operator ID and display or put it in some log file.

    • Admin
      Admin almost 11 years
      Depending on what you want to do, there are different ways of doing them in Excel. If you want to search, then can do so via Ctrl+F bringing up the Find dialog. If there are multiple criteria, perhaps filters will help you out. For removing duplicates there are a number of ways. You'll have to be a bit more clear in what you require. Most likely though if you search this site, you'll find what you need as your requirements seem to be simple enough.
    • Admin
      Admin almost 11 years
      Are all the Excel files in the same two-column format? Are the files .xlsx or .csv? Do the files have multiple sheets? Adding more information to your question will help you get answers.
  • Ярослав Рахматуллин
    Ярослав Рахматуллин over 10 years
    You can get a copy on the input file with wget http://sprunge.us/MbhE -q -O- | base64.exe -d > ram.xls.
  • Ken Ingram
    Ken Ingram about 5 years
    Damn. That was too good to be true. I have it installed, but an error is all I got back: ===> "ValueError: invalid literal for int() with base 10: '1023 1025'"
  • SuperSafie
    SuperSafie about 5 years
    Try both with python2 and python3
  • Ken Ingram
    Ken Ingram about 5 years
    I want to try out your solution but your link is "Forbidden"
  • Ken Ingram
    Ken Ingram about 5 years
    Eh. What do you mean? Try both what?
  • Ken Ingram
    Ken Ingram about 5 years
    I have a list of 10 xlsx files and I want to grep for one piece of data.
  • Ярослав Рахматуллин
    Ярослав Рахматуллин about 5 years
    just look at the file before you run it, don't be silly.
  • Ken Ingram
    Ken Ingram about 5 years
    Listen. Don't get personal. I get a 403 error with that link. "2019-02-08 09:38:13 ERROR 403: Forbidden."
  • Ярослав Рахматуллин
    Ярослав Рахматуллин about 5 years
    ok. i thought you meant something else. just google the package xls2txt
  • Admin
    Admin almost 2 years
    How is this better than accepted answer?
  • Admin
    Admin almost 2 years
    Does it have to be better? I gave an answer that I know works and it looks far easier than the rest.
  • Admin
    Admin almost 2 years
    For example, you don't explain how you get the .csv file.