Using an index to make grep faster?

8,358

Solution 1

what about cscope, does this match your shoes?

Allows searching code for:

  • all references to a symbol
  • global definitions
  • functions called by a function
  • functions calling a function
  • text string
  • regular expression pattern
  • a file
  • files including a file

Solution 2

Full-text indexing

There are tools such as recoll, swish-e and sphinx but you'd have to check if they can support the sort of search criteria you need.

Recoll

Recoll is a personal full text search tool for Unix/Linux.

Swish-e

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.

Sphinx

Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily

grep

I'm surprised grep is as slow as you describe, can you reduce the number of files being searched? For example when I only need to search the source files for one executable (out of many in a project) I feed grep the names from a command that lists the source files for that program:

grep expression `sources myprogram`

sources is a program specific to my development environment but you may have (or be able to construct) something equivalent.

I'm assuming you've tried obvious techniques such as

find /foo/myproject -name "*.c" -exec fgrep -l searchtext

I've read a suggestion that the -P option of current grep can speed up searches significantly.

Solution 3

You could copy your codebase on a RAM disk.

Solution 4

grep, no. But there are several programs which use indexes and aimed at code base. ctags (there is a version provided with vim), etags (aimed for use with emacs), global (more independent of the editor) are the one I'm thinking about now but there are probably other.

Solution 5

if you want to use a fulltext search engine .. use one:

Share:
8,358

Related videos on Youtube

Peltier
Author by

Peltier

Author of autojump, the fastest way to move around your filesystem from the command line.

Updated on September 18, 2022

Comments

  • Peltier
    Peltier over 1 year

    I find myself grepping the same codebase over and over. While it works great, each command takes about 10 seconds, so I am thinking about ways to make it faster.

    So can grep use some sort of index? I understand an index probably won't help for complicated regexps, but I use mostly very simple patters. Does an indexer exist for this case?

    EDIT: I know about ctags and the like, but I would like to do full-text search.

    • Michał Šrajer
      Michał Šrajer over 12 years
      Are you using recursive oprtion for grep or some find/xargs like way?
    • Peltier
      Peltier over 12 years
      @Michał : yes, -R
  • Peltier
    Peltier over 12 years
    I use ctags, but isn't that limited to searching function names? I want to do full-text search.
  • Peltier
    Peltier over 12 years
    AFAIK locate is only for filenames. recoll would work, but I would prefer a command-line tool. The code base is pretty big, and since I'm looking for a string, I don't know where it is, so it's hard to limit the number of files to be searched :)
  • Peltier
    Peltier over 12 years
    That's always an option, but I was wondering if a more lightweight, quick and dirty grep speedup option would exist.
  • user5249203
    user5249203 over 12 years
    I think swish-e is command-line. I haven't tried any (grep is fast enough on my projects)
  • akira
    akira over 12 years
    'more lightweight' but 'want to have my stuff fully indexed' are a bit of 2 extremes :) ctags is the best match for what you want, if you just want to go quick an dirty. with everything else you end up using a real fulltext-search-engine. eg, 'recoll' mentioned in @RedGrittyBrick answer is using xapian as the backend.
  • Peltier
    Peltier over 12 years
    They're not necessarily incompatible. Imagine if ctags had a --full-text option, for instance, and grep a --tag-file option. Of course the fact that it could exist doesn't mean that it does :)
  • Peltier
    Peltier over 12 years
    That could be what I'm looking for, I'll take a look. Thanks!
  • Peltier
    Peltier over 11 years
    Ack is pretty cool. But I really doubt it's any faster than grep, since it is based on the same mechanisms.
  • neves
    neves over 6 years
    It looks like it just works well for C, maybe C++ and Java