creating a simple index on a text file in java

17,349

Solution 1

As the comments pointed out, you're going to need to do a linear search of the entire file in worst case, and half of it on average. But fortunately there are some tricks you can do.

If the file doesn't change much, then create a copy of the file in which the entries are sorted. Ideally make records in the copy the same length, so that you can go straight to the Nth entry in the sorted file.

If you don't have the disk space for that, then create an index file, which has all the keys in the original file as key and the offset into the original file as the value. Again used fixed length records. Or better, make this index file a database. Or load the original file into a database. In either case, disk storage is very cheap.

EDIT: To create the index file, open the main file using RandomAccessFile and read it sequentially. Use the 'getFilePointer()' method at the start of each entry to read the position in the file, and store that plus the key in the index file. When looking up something read the file pointer from the index file and use the 'seek(long)' method to jump to the point in the original file.

Solution 2

I'd recommend building an index file. Scan the input file and write every key and its offset into a List, then sort the list and write it to the index file. Then, whenever you want to look up a key, you read in the index file and do a binary search on the list. Once you find the key you need, open the data file as a RandomAccessFile and seek to the position of the key. Then you can read the key and the value.

Share:
17,349
vjain27
Author by

vjain27

Updated on June 05, 2022

Comments

  • vjain27
    vjain27 almost 2 years

    I need to implement a simple indexing scheme for a big text file. The text file contains key value pairs and I need to read back a specific key value pair without loading the complete file in memory. The text file is huge and contains millions of entries and the keys are not sorted. Different key-value pairs need to be read depending on user-input. So I don't want the complete file to be read every time. Please let me know the exact classes and methods in java file handling api that would help to implement this in a simple and efficient way.I want to do this without using an external library such as lucene.

  • vjain27
    vjain27 over 12 years
    Actually I wanted to ask how to create the index file that you mentioned using java file handling api and which classes/methods will be helpful in creating and reading the index.