Modifying or deleting a line from a text file the low-level way?

12,343

Solution 1

I find this an interesting question, so I made a small console app.

I used 3 methods:

  • TStringList
  • Streamreader/StreamWriter
  • Text file

All methods are timed and repeated 100 times with a text file of 10kb in size and a text file 1Mb in size. Here is the program:

program Project16;

{$APPTYPE CONSOLE}

uses
  SysUtils, Classes, StrUtils, Diagnostics, IOUtils;

procedure DeleteLine(StrList: TStringList; SearchPattern: String);

var
  Index : Integer;

begin
 for Index := 0 to StrList.Count-1 do
  begin
   if ContainsText(StrList[Index], SearchPattern) then
    begin
     StrList.Delete(Index);
     Break;
    end;
  end;
end;

procedure DeleteLineWithStringList(Filename : string; SearchPattern : String);

var StrList : TStringList;

begin
 StrList := TStringList.Create;
 try
  StrList.LoadFromFile(Filename);
  DeleteLine(StrList, SearchPattern);
  // don't overwrite our input file so we can test
  StrList.SaveToFile(TPath.ChangeExtension(Filename, '.new'));
 finally
  StrList.Free;
 end;
end;

procedure DeleteLineWithStreamReaderAndWriter(Filename : string; SearchPattern : String);

var
  Reader    : TStreamReader;
  Writer    : TStreamWriter;
  Line      : String;
  DoSearch  : Boolean;
  DoWrite   : Boolean;

begin
 Reader := TStreamReader.Create(Filename);
 Writer := TStreamWriter.Create(TPath.ChangeExtension(Filename, '.new'));
 try
  DoSearch := True;
  DoWrite := True;
  while Reader.Peek >= 0 do
   begin
    Line := Reader.ReadLine;
    if DoSearch then
     begin
      DoSearch := not ContainsText(Line, SearchPattern);
      DoWrite := DoSearch;
     end;
    if DoWrite then
     Writer.WriteLine(Line)
    else
     DoWrite := True;
   end;
 finally
  Reader.Free;
  Writer.Free;
 end;
end;

procedure DeleteLineWithTextFile(Filename : string; SearchPattern : String);

var
 InFile    : TextFile;
 OutFile   : TextFile;
 Line      : String;
 DoSearch  : Boolean;
 DoWrite   : Boolean;


begin
 AssignFile(InFile, Filename);
 AssignFile(OutFile, TPath.ChangeExtension(Filename, '.new'));
 Reset(InFile);
 Rewrite(OutFile);
 try
  DoSearch := True;
  DoWrite := True;
  while not EOF(InFile) do
   begin
    Readln(InFile, Line);
    if DoSearch then
     begin
      DoSearch := not ContainsText(Line, SearchPattern);
      DoWrite := DoSearch;
     end;
    if DoWrite then
     Writeln(OutFile, Line)
    else
     DoWrite := True;
   end;
 finally
  CloseFile(InFile);
  CloseFile(OutFile);
 end;
end;

procedure TimeDeleteLineWithStreamReaderAndWriter(Iterations : Integer);

var
  Count : Integer;
  Sw    : TStopWatch;

begin
 Writeln(Format('Delete line with stream reader/writer - file 10kb, %d iterations', [Iterations]));
 Sw := TStopwatch.StartNew;
 for Count := 1 to Iterations do
  DeleteLineWithStreamReaderAndWriter('c:\temp\text10kb.txt', 'thislinewillbedeleted=');
 Sw.Stop;
 Writeln(Format('Elapsed time : %d milliseconds', [Sw.ElapsedMilliseconds]));
 Writeln(Format('Delete line with stream reader/writer - file 1Mb, %d iterations', [Iterations]));
 Sw := TStopwatch.StartNew;
 for Count := 1 to Iterations do
  DeleteLineWithStreamReaderAndWriter('c:\temp\text1Mb.txt', 'thislinewillbedeleted=');
 Sw.Stop;
 Writeln(Format('Elapsed time : %d milliseconds', [Sw.ElapsedMilliseconds]));
end;

procedure TimeDeleteLineWithStringList(Iterations : Integer);

var
  Count : Integer;
  Sw    : TStopWatch;

begin
 Writeln(Format('Delete line with TStringlist - file 10kb, %d iterations', [Iterations]));
 Sw := TStopwatch.StartNew;
 for Count := 1 to Iterations do
  DeleteLineWithStringList('c:\temp\text10kb.txt', 'thislinewillbedeleted=');
 Sw.Stop;
 Writeln(Format('Elapsed time : %d milliseconds', [Sw.ElapsedMilliseconds]));
 Writeln(Format('Delete line with TStringlist - file 1Mb, %d iterations', [Iterations]));
 Sw := TStopwatch.StartNew;
 for Count := 1 to Iterations do
  DeleteLineWithStringList('c:\temp\text1Mb.txt', 'thislinewillbedeleted=');
 Sw.Stop;
 Writeln(Format('Elapsed time : %d milliseconds', [Sw.ElapsedMilliseconds]));
end;

procedure TimeDeleteLineWithTextFile(Iterations : Integer);

var
  Count : Integer;
  Sw    : TStopWatch;

begin
 Writeln(Format('Delete line with text file - file 10kb, %d iterations', [Iterations]));
 Sw := TStopwatch.StartNew;
 for Count := 1 to Iterations do
  DeleteLineWithTextFile('c:\temp\text10kb.txt', 'thislinewillbedeleted=');
 Sw.Stop;
 Writeln(Format('Elapsed time : %d milliseconds', [Sw.ElapsedMilliseconds]));
 Writeln(Format('Delete line with text file - file 1Mb, %d iterations', [Iterations]));
 Sw := TStopwatch.StartNew;
 for Count := 1 to Iterations do
  DeleteLineWithTextFile('c:\temp\text1Mb.txt', 'thislinewillbedeleted=');
 Sw.Stop;
 Writeln(Format('Elapsed time : %d milliseconds', [Sw.ElapsedMilliseconds]));
end;

begin
  try
    TimeDeleteLineWithStringList(100);
    TimeDeleteLineWithStreamReaderAndWriter(100);
    TimeDeleteLineWithTextFile(100);
    Writeln('Press ENTER to quit');
    Readln;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

Output:

Delete line with TStringlist - file 10kb, 100 iterations
Elapsed time : 188 milliseconds
Delete line with TStringlist - file 1Mb, 100 iterations
Elapsed time : 5137 milliseconds
Delete line with stream reader/writer - file 10kb, 100 iterations
Elapsed time : 456 milliseconds
Delete line with stream reader/writer - file 1Mb, 100 iterations
Elapsed time : 22382 milliseconds
Delete line with text file - file 10kb, 100 iterations
Elapsed time : 250 milliseconds
Delete line with text file - file 1Mb, 100 iterations
Elapsed time : 9656 milliseconds
Press ENTER to quit

As you can see is TStringList the winner here. Since you are not able to use TStringList, TextFile is not a bad choice after all...

P.S. : this code omits the part where you have to delete the inputfile and rename the outputfile to the original filename

Solution 2

Without loading the entire file into a container like TStringList, your only option is to:

  • Open the file for input
  • Open a separate copy for output
  • Start a loop
  • Read the content line by line from the input file
  • Write the content out line by line to the output file until you reach the line you want to change/delete
  • Break the loop
  • Read the input line from the input file
  • Write the changed line (or skip writing the line you want to delete) to the output file
  • Start a new loop
  • Read the remainder of the input content, line by line
  • Write the rest of that input to the output file, line by line
  • Break the loop
  • Close the files

So to answer your specific questions:

if N = UpperCase(Name) then begin
  //How to re-write this line?
  Break;
end;

WriteLn the new output to the second (output) file.

if N = UpperCase(Name) then begin
  //How to delete this line?
  Break;
end;

Just skip the WriteLn that outputs the indicated line to the second (output) file.

Your artificial limitation of "I don't want to use TStringList" simply complicates the task for you, when you can simply:

  • Load the original file into TStringList using LoadFromFile
  • Locate the line you want to modify, either by index, iteration, or IndexOf()
  • Modify the line by changing it directly, or deleting it from the TStringList
  • Write the entire content out to the original file using TStringList.SaveToFile

The only reasons I've found to not use TStringList to perform these kinds of operations have been that the file size exceeds the capacity of a TStringList (never happened) or when dealing with a file that is text but isn't really "line" oriented (for instance, EDI files that are typically one very long single line of text, or XML files that may not contain line feeds and therefore are also one very long single line of text). Even in the case of EDI or XML, though, it's quite frequently to load them into a TStringList, make the conversion to line-based format (inserting line breaks or whatever), and do the retrieval from the stringlist.

Solution 3

Basically, you can't do what you want to do if you treat the files as simple text files. Such files can be read (from the beginning only) or written to (either from the start, thus creating a new file) or from the end (appending to an existing file). They are not random access files.

On the other hand, you might want to consider defining a file of type string: each record in the file would be a string, and you can access this file in a random fashion. The problem then becomes in knowing which record to access for which string.

A third possibility is using INI files which are more structured and sound like a better bet for your purposes. Apart from the section header, they are a series of strings, key=value, and can be accessed on the basis of the key.

Share:
12,343
Jerry Dodge
Author by

Jerry Dodge

I'm a Delphi developer. I work for a software company which does solutions for retail management, including inventory, POS, reporting, BI, Tags, and more. It's been in Delphi since Delphi's been around. I am actively in Stack Overflow monitoring the Delphi tag, and looking for those questions I can answer and also contributing my time to keep Stack Overflow in order. I'm not an expert in anything, a jack of all trades rather. But I love to help people when I'm able to. I've known Delphi since about 2007 now, and before that, I had learned VB6. I havn't gone back to VB since I learned Delphi. I also taught myself QBasic and HTML as a kid. It hasn't been until the past 5 years that I've been diving into programming. Since then I've also become vaguely familiar with ASP.NET with C#, as well as some C# windows apps. But I'm not too fond of the whole .NET idea. .NET is good for web platforms and such, but not for win apps. My latest work has been with Delphi 10 Seattle mobile development. I'm still very raw on the subject, but see a huge potential behind it. My strengths: Understanding the bigger picture of projects Writing Custom Classes, Components, and Controls Code organization (within unit or namespace) Writing purely independent classes (as opposed to cross-referencing units or namespaces) User Friendly UI's Developer Friendly Classes Encapsulating layers of business logic My weaknesses: Lower-level coding (such as Assembly) Platform-specific design (using Firemonkey) Web Design It's always nice to know you're able to do something, even if you never use it.

Updated on June 04, 2022

Comments

  • Jerry Dodge
    Jerry Dodge almost 2 years

    I'm working with a Text File in Delphi, and I don't wish to use the method of loading/saving with a string list. I intend to maintain an open filestream where I read and write my data there, keeping massive amounts of data on the hard disk instead of in the memory. I have the simple concept of writing new lines to a text file and reading them, but when it comes to modifying and deleting them, I cannot find any good resources.

    Each line in this file contains a name, and equals sign, and the rest is data. For example, SOMEUNIQUENAME=SomeStringValue. I intend to keep a file open for a period of time inside of a thread. This thread performs incoming requests to either get, set, or delete certain fields of data. I use WriteLn and ReadLn in a loop, evaluating EOF. Below is an example of how I read the data:

    FFile = TextFile;
    
    ...
    
    function TFileWrapper.ReadData(const Name: String): String;
    var
      S: String; //Temporary line to be parsed
      N: String; //Temporary name of field
    begin
      Result:= '';
      Reset(FFile);
      while not EOF(FFile) do begin
        ReadLn(FFile, S);
        N:= UpperCase(Copy(S, 1, Pos('=', S)-1));
        if N = UpperCase(Name) then begin
          Delete(S, 1, Pos('=', S));
          Result:= S;
          Break;
        end;
      end;
    end;
    

    ...and then I trigger an event which informs sender of result. The requests are inside of a queue, which is sort of a message pump for these requests. The thread simply processes the next request in the queue repeatedly, similar to how typical applications work.

    I have procedures ready to be able to write and delete these fields, but I don't know what I have to do to actually perform the action on the file.

    procedure TFileWrapper.WriteData(const Name, Value: String);
    var
      S: String; //Temporary line to be parsed
      N: String; //Temporary name of field
    begin
      Result:= '';
      Reset(FFile);
      while not EOF(FFile) do begin
        ReadLn(FFile, S);
        N:= UpperCase(Copy(S, 1, Pos('=', S)-1));
        if N = UpperCase(Name) then begin
          //How to re-write this line?
          Break;
        end;
      end;
    end;
    
    procedure TFileWrapper.DeleteData(const Name: String);
    var
      S: String; //Temporary line to be parsed
      N: String; //Temporary name of field
    begin
      Result:= '';
      Reset(FFile);
      while not EOF(FFile) do begin
        ReadLn(FFile, S);
        N:= UpperCase(Copy(S, 1, Pos('=', S)-1));
        if N = UpperCase(Name) then begin
          //How to delete this line?
          Break;
        end;
      end;
    end;
    

    In the end, I need to avoid loading the entire file into the memory to be able to accomplish this.

  • Jerry Dodge
    Jerry Dodge over 11 years
    Yes, Ini files were my first direction, but I wanted to skip a bit and not require any header section, but I might as well implement it anyway.
  • Jerry Dodge
    Jerry Dodge over 11 years
    That does indeed sound like the only solution, I'm just pondering how SQL Server is able to maintain its data feed in the file stream, and any complex database for that matter. I'm just trying to avoid having to use up the memory many times over for massive amounts of data.
  • jachguate
    jachguate over 11 years
    @Jerry SQL Server or any modern database engine does not use text files to store data. The data is usually stored in pages, and each page contains none, one or many records. When you update a record, if, for example, a varchar column does not fit in the same space as the old record, it is moved to the end of the page or to a different page. There are books about how this works. Read for example Understanding SQL Server Storage Structures or search google for "(sql server/oracle/firebird/db2) internal storage"
  • jachguate
    jachguate over 11 years
    Other little thing: by definition a text file is a sequential file, because the nature of different length lines, and that's traditionally have a different treatment from a random access file. In a text file, without traversing the entire file, you have no method to predict at which file position is stored any particular line, and you can't really change any line size directly in the file. In a random access file, you can go to any record, page or byte position, because all the pages, records or bytes have the same length, so you can do some arithmetic to get that position.
  • Ken White
    Ken White over 11 years
    One thing to note, though: IIRC, TextFile is not Unicode-aware, so it won't work with anything but ASCII text.
  • Ken White
    Ken White over 11 years
    Jerry, as @jachguate said, SQL Server (or any other RDBMS, for that matter) doesn't use anything even remotely resembling text files. They use complex structures of binary (very often compressed) data that are maintained in pages. Comparing the two is like comparing a tricycle (the text file) to a Porsche (SQL Server) (Oracle would be the Ferrari <g>). In other words, you can't.
  • whosrdaddy
    whosrdaddy over 11 years
    @KenWhite: That is not correct. I tested some files in UTF-8 format and it works like a charm
  • Ken White
    Ken White over 11 years
    What Delphi version? This wasn't the case in D2009-XE, because it was a common problem on the EMBT forums (which is where I got my info from - I haven't used TextFile in a decade because it's simply old Pascal methodology, and there are better ways now).
  • whosrdaddy
    whosrdaddy over 11 years
    @KenWhite Delphi XE here. Me neither, I just find it surprising that TextFile performs relatively fast, even if it's old :)
  • David Heffernan
    David Heffernan over 11 years
    Oracle is more like a Delorean in fact. A huge rip off owned by an evil crook.
  • Remy Lebeau
    Remy Lebeau over 11 years
    @KenWhite: in the answer you describe, there is a little room for optimization. If the desired line is the last line in the file, you can simply truncate the source file to the appropriate offset, you don't need to copy the file. If the line is in the front or middle of the file, then the second loop should read/write the data in chunk instead of in lines, as you don't have to worry about the actual contents. Lastly, use buffered reading, or a memory mapping, for faster access to the source file.
  • Ken White
    Ken White over 11 years
    @Remy: Great. :-) The question was about using TextFile - how exactly do you do all those things (especially reading/writing in chunks, ignoring lines, and using memory mapping)? I don't see any of those things in the help file for TextFile. (Note the question didn't ask about writing anything new, but specifically about using TextFile and not a container like TStringList.)
  • Remy Lebeau
    Remy Lebeau over 11 years
    @KenWhite: I was commenting on your list of required steps, not the actually implementation of them. Obviously, you can't some things with TextFile, you have to ditch it in favor offor more direct I/O, like FileRead() or TFileStream.
  • Ken White
    Ken White over 11 years
    @Remy: My point is that I was referring specifically to using TextFile, which I clearly indicated in my answer; pointing out ways to optimize it via other methods wasn't quite fair. :-) I don't comment on your answers about how to do things in Indy by pointing out easier ways to accomplish them in ICS or Synapse. <g>