JAVA : read and write a file together

10,325

Solution 1

It's far easier if you don't do two things at the same time. The best way is to run through the entire file, count all the occurrences of each string in a hash and then write out all the results into another file. Then if you need to, move the new file over the old one.

You never want to read and write to the same file at the same time. Your offsets within the file will shift everytime you make a write and the read cursor will not keep track of that.

Solution 2

I'd do it this way: - Parse the original file and save all entries into a new file. Use fixed length data blocks to write entries to the new file (so, say your longest string is 10 bytes long, take 10 + x as block length, x is for the extra info you want to save along the entries. So the 10th entry in the file would be at byte position 10*(10+x)). You'd also have to know the number of entries to create the (so the file size would noOfEntries*blocklength, use a RandomAccesFile and setLength to set the this file length). - Now use quicksort algorithm to sort the entries in the file (my idea is to have a sorted file in the end which makes things far easier and faster finally. Hashing would theoretically work too, but you'd have to deal with rearranging duplicate entries then to have all duplicates grouped - not really a choice here). - Parse the file with the now sorted entries. Save a pointer to the entry of the first occurence of a entry. Increment the number of duplicates until there is a new entry. Change the first entry and add that additonal info you want to have there into a new "final result" file. Continue this way with all remaining entries in the sorted file.

Conclusions: I think this should be a reasonably fast and use reasonable amount of resources. However, it depends on the data you have. If you have a very large number of duplicates, quicksort performance will degrade. Also, if your longest data entry is way longer than the average, it will also waste file space.

Share:
10,325
sharath
Author by

sharath

Updated on June 07, 2022

Comments

  • sharath
    sharath over 1 year

    I am trying to read a java file and modify it simultaneously. This is what I need to do : My file is of the format :

    aaa
    bbb
    aaa
    ccc
    ddd
    ddd
    

    I need to read through the file and get the count of the # of occurrences and modify the duplicates to get the following file:

    aaa -  2
    bbb -  1
    ccc -  1
    ddd -  2
    

    I tried using the RandomAccessFile to do this, but couldn't do it. Can somebody help me out with the code for this one?

  • Rob Elsner
    Rob Elsner almost 13 years
    This is my thought as well, it just took me too long to type it out with work getting in the way!
  • sharath
    sharath almost 13 years
    well the problem is that the the file I have is way too large. Keeping it in memory just wont work.. And therefore, hastables are a bad idea.. I have no choice but to resort to file operations :( Bad idea, but have no other go..
  • J _
    J _ almost 13 years
    How long is the longest string? You could use a trie. It would take a little less space and if you have lots of overlap in terms, it would take a lot less space. Worst case, you really should use a database, rather than essentially writing your own.