Buffered files (for faster disk access)

16,599

Solution 1

For everybody's interest: Embarcadero added TBufferedFileStream (see the documentation) in the latest Release of Delphi 10.1 Berlin.

Unfortunately, I can't say how it competes with the solutions given here as I haven't bought the update yet. I am also aware of that the question was asked on Delphi 7 but I am sure the reference to Delphi's own implementation can be useful in the future.

Solution 2

Windows file caching is very effective, especially if you are using Vista or later. TFileStream is a loose wrapper around the Windows ReadFile() and WriteFile() API functions and for many use cases the only thing faster is a memory mapped file.

However, there is one common scenario where TFileStream becomes a performance bottleneck. That is if you read or write small amounts of data with each call to the stream read or write functions. For example if you read an array of integers one item at a time then you incur a significant overhead by reading 4 bytes at a time in the calls to ReadFile().

Again, memory mapped files are an excellent way to solve this bottleneck, but the other commonly used approach is to read a much larger buffer, many kilobytes say, and then resolve future reads of the stream from this in memory cache rather than further calls to ReadFile(). This approach only really works for sequential access.


From the use pattern described in your updated question, I think you may find the following classes would improve performance for you:

unit BufferedFileStream;

interface

uses
  SysUtils, Math, Classes, Windows;

type
  TBaseCachedFileStream = class(TStream)
  private
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  protected
    FHandle: THandle;
    FOwnsHandle: Boolean;
    FCache: PByte;
    FCacheSize: Integer;
    FPosition: Int64;//the current position in the file (relative to the beginning of the file)
    FCacheStart: Int64;//the postion in the file of the start of the cache (relative to the beginning of the file)
    FCacheEnd: Int64;//the postion in the file of the end of the cache (relative to the beginning of the file)
    FFileName: string;
    FLastError: DWORD;
    procedure HandleError(const Msg: string);
    procedure RaiseSystemError(const Msg: string; LastError: DWORD); overload;
    procedure RaiseSystemError(const Msg: string); overload;
    procedure RaiseSystemErrorFmt(const Msg: string; const Args: array of const);
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; virtual; abstract;
    function GetFileSize: Int64; virtual;
    procedure SetSize(NewSize: Longint); override;
    procedure SetSize(const NewSize: Int64); override;
    function FileRead(var Buffer; Count: Longword): Integer;
    function FileWrite(const Buffer; Count: Longword): Integer;
    function FileSeek(const Offset: Int64; Origin: TSeekOrigin): Int64;
  public
    constructor Create(const FileName: string); overload;
    constructor Create(const FileName: string; CacheSize: Integer); overload;
    constructor Create(const FileName: string; CacheSize: Integer; Handle: THandle); overload; virtual;
    destructor Destroy; override;
    property CacheSize: Integer read FCacheSize;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
    function Seek(const Offset: Int64; Origin: TSeekOrigin): Int64; override;
  end;
  TBaseCachedFileStreamClass = class of TBaseCachedFileStream;

  IDisableStreamReadCache = interface
    ['{0B6D0004-88D1-42D5-BC0F-447911C0FC21}']
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
  end;

  TReadOnlyCachedFileStream = class(TBaseCachedFileStream, IDisableStreamReadCache)
  (* This class works by filling the cache each time a call to Read is made and
     FPosition is outside the existing cache.  By filling the cache we mean
     reading from the file into the temporary cache.  Calls to Read when
     FPosition is in the existing cache are then dealt with by filling the
     buffer with bytes from the cache.
  *)
  private
    FUseAlignedCache: Boolean;
    FViewStart: Int64;
    FViewLength: Int64;
    FDisableStreamReadCacheRefCount: Integer;
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
    procedure FlushCache;
  protected
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; override;
    function GetFileSize: Int64; override;
  public
    constructor Create(const FileName: string; CacheSize: Integer; Handle: THandle); overload; override;
    property UseAlignedCache: Boolean read FUseAlignedCache write FUseAlignedCache;
    function Read(var Buffer; Count: Longint): Longint; override;
    procedure SetViewWindow(const ViewStart, ViewLength: Int64);
  end;

  TWriteCachedFileStream = class(TBaseCachedFileStream, IDisableStreamReadCache)
  (* This class works by caching calls to Write.  By this we mean temporarily
     storing the bytes to be written in the cache.  As each call to Write is
     processed the cache grows.  The cache is written to file when:
       1.  A call to Write is made when the cache is full.
       2.  A call to Write is made and FPosition is outside the cache (this
           must be as a result of a call to Seek).
       3.  The class is destroyed.

     Note that data can be read from these streams but the reading is not
     cached and in fact a read operation will flush the cache before
     attempting to read the data.
  *)
  private
    FFileSize: Int64;
    FReadStream: TReadOnlyCachedFileStream;
    FReadStreamCacheSize: Integer;
    FReadStreamUseAlignedCache: Boolean;
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
    procedure CreateReadStream;
    procedure FlushCache;
  protected
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; override;
    function GetFileSize: Int64; override;
  public
    constructor Create(const FileName: string; CacheSize, ReadStreamCacheSize: Integer; ReadStreamUseAlignedCache: Boolean); overload;
    destructor Destroy; override;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
  end;

implementation

function GetFileSizeEx(hFile: THandle; var FileSize: Int64): BOOL; stdcall; external kernel32;
function SetFilePointerEx(hFile: THandle; DistanceToMove: Int64; lpNewFilePointer: PInt64; dwMoveMethod: DWORD): BOOL; stdcall; external kernel32;

{ TBaseCachedFileStream }

constructor TBaseCachedFileStream.Create(const FileName: string);
begin
  Create(FileName, 0);
end;

constructor TBaseCachedFileStream.Create(const FileName: string; CacheSize: Integer);
begin
  Create(FileName, CacheSize, 0);
end;

constructor TBaseCachedFileStream.Create(const FileName: string; CacheSize: Integer; Handle: THandle);
const
  DefaultCacheSize = 16*1024;
  //16kb - this was chosen empirically - don't make it too large otherwise the progress report is 'jerky'
begin
  inherited Create;
  FFileName := FileName;
  FOwnsHandle := Handle=0;
  if FOwnsHandle then begin
    FHandle := CreateHandle(FILE_ATTRIBUTE_NORMAL);
  end else begin
    FHandle := Handle;
  end;
  FCacheSize := CacheSize;
  if FCacheSize<=0 then begin
    FCacheSize := DefaultCacheSize;
  end;
  GetMem(FCache, FCacheSize);
end;

destructor TBaseCachedFileStream.Destroy;
begin
  FreeMem(FCache);
  if FOwnsHandle and (FHandle<>0) then begin
    CloseHandle(FHandle);
  end;
  inherited;
end;

function TBaseCachedFileStream.QueryInterface(const IID: TGUID; out Obj): HResult;
begin
  if GetInterface(IID, Obj) then begin
    Result := S_OK;
  end else begin
    Result := E_NOINTERFACE;
  end;
end;

function TBaseCachedFileStream._AddRef: Integer;
begin
  Result := -1;
end;

function TBaseCachedFileStream._Release: Integer;
begin
  Result := -1;
end;

procedure TBaseCachedFileStream.HandleError(const Msg: string);
begin
  if FLastError<>0 then begin
    RaiseSystemError(Msg, FLastError);
  end;
end;

procedure TBaseCachedFileStream.RaiseSystemError(const Msg: string; LastError: DWORD);
begin
  raise EStreamError.Create(Trim(Msg+'  ')+SysErrorMessage(LastError));
end;

procedure TBaseCachedFileStream.RaiseSystemError(const Msg: string);
begin
  RaiseSystemError(Msg, GetLastError);
end;

procedure TBaseCachedFileStream.RaiseSystemErrorFmt(const Msg: string; const Args: array of const);
var
  LastError: DWORD;
begin
  LastError := GetLastError; // must call GetLastError before Format
  RaiseSystemError(Format(Msg, Args), LastError);
end;

function TBaseCachedFileStream.GetFileSize: Int64;
begin
  if not GetFileSizeEx(FHandle, Result) then begin
    RaiseSystemErrorFmt('GetFileSizeEx failed for %s.', [FFileName]);
  end;
end;

procedure TBaseCachedFileStream.SetSize(NewSize: Longint);
begin
  SetSize(Int64(NewSize));
end;

procedure TBaseCachedFileStream.SetSize(const NewSize: Int64);
begin
  Seek(NewSize, soBeginning);
  if not Windows.SetEndOfFile(FHandle) then begin
    RaiseSystemErrorFmt('SetEndOfFile for %s.', [FFileName]);
  end;
end;

function TBaseCachedFileStream.FileRead(var Buffer; Count: Longword): Integer;
begin
  if Windows.ReadFile(FHandle, Buffer, Count, LongWord(Result), nil) then begin
    FLastError := 0;
  end else begin
    FLastError := GetLastError;
    Result := -1;
  end;
end;

function TBaseCachedFileStream.FileWrite(const Buffer; Count: Longword): Integer;
begin
  if Windows.WriteFile(FHandle, Buffer, Count, LongWord(Result), nil) then begin
    FLastError := 0;
  end else begin
    FLastError := GetLastError;
    Result := -1;
  end;
end;

function TBaseCachedFileStream.FileSeek(const Offset: Int64; Origin: TSeekOrigin): Int64;
begin
  if not SetFilePointerEx(FHandle, Offset, @Result, ord(Origin)) then begin
    RaiseSystemErrorFmt('SetFilePointerEx failed for %s.', [FFileName]);
  end;
end;

function TBaseCachedFileStream.Read(var Buffer; Count: Integer): Longint;
begin
  raise EAssertionFailed.Create('Cannot read from this stream');
end;

function TBaseCachedFileStream.Write(const Buffer; Count: Integer): Longint;
begin
  raise EAssertionFailed.Create('Cannot write to this stream');
end;

function TBaseCachedFileStream.Seek(const Offset: Int64; Origin: TSeekOrigin): Int64;
//Set FPosition to the value specified - if this has implications for the
//cache then overriden Write and Read methods must deal with those.
begin
  case Origin of
  soBeginning:
    FPosition := Offset;
  soEnd:
    FPosition := GetFileSize+Offset;
  soCurrent:
    inc(FPosition, Offset);
  end;
  Result := FPosition;
end;

{ TReadOnlyCachedFileStream }

constructor TReadOnlyCachedFileStream.Create(const FileName: string; CacheSize: Integer; Handle: THandle);
begin
  inherited;
  SetViewWindow(0, inherited GetFileSize);
end;

function TReadOnlyCachedFileStream.CreateHandle(FlagsAndAttributes: DWORD): THandle;
begin
  Result := Windows.CreateFile(
    PChar(FFileName),
    GENERIC_READ,
    FILE_SHARE_READ,
    nil,
    OPEN_EXISTING,
    FlagsAndAttributes,
    0
  );
  if Result=INVALID_HANDLE_VALUE then begin
    RaiseSystemErrorFmt('Cannot open %s.', [FFileName]);
  end;
end;

procedure TReadOnlyCachedFileStream.DisableStreamReadCache;
begin
  inc(FDisableStreamReadCacheRefCount);
end;

procedure TReadOnlyCachedFileStream.EnableStreamReadCache;
begin
  dec(FDisableStreamReadCacheRefCount);
end;

procedure TReadOnlyCachedFileStream.FlushCache;
begin
  FCacheStart := 0;
  FCacheEnd := 0;
end;

function TReadOnlyCachedFileStream.GetFileSize: Int64;
begin
  Result := FViewLength;
end;

procedure TReadOnlyCachedFileStream.SetViewWindow(const ViewStart, ViewLength: Int64);
begin
  if ViewStart<0 then begin
    raise EAssertionFailed.Create('Invalid view window');
  end;
  if (ViewStart+ViewLength)>inherited GetFileSize then begin
    raise EAssertionFailed.Create('Invalid view window');
  end;
  FViewStart := ViewStart;
  FViewLength := ViewLength;
  FPosition := 0;
  FCacheStart := 0;
  FCacheEnd := 0;
end;

function TReadOnlyCachedFileStream.Read(var Buffer; Count: Longint): Longint;
var
  NumOfBytesToCopy, NumOfBytesLeft, NumOfBytesRead: Longint;
  CachePtr, BufferPtr: PByte;
begin
  if FDisableStreamReadCacheRefCount>0 then begin
    FileSeek(FPosition+FViewStart, soBeginning);
    Result := FileRead(Buffer, Count);
    if Result=-1 then begin
      Result := 0;//contract is to return number of bytes that were read
    end;
    inc(FPosition, Result);
  end else begin
    Result := 0;
    NumOfBytesLeft := Count;
    BufferPtr := @Buffer;
    while NumOfBytesLeft>0 do begin
      if (FPosition<FCacheStart) or (FPosition>=FCacheEnd) then begin
        //the current position is not available in the cache so we need to re-fill the cache
        FCacheStart := FPosition;
        if UseAlignedCache then begin
          FCacheStart := FCacheStart - (FCacheStart mod CacheSize);
        end;
        FileSeek(FCacheStart+FViewStart, soBeginning);
        NumOfBytesRead := FileRead(FCache^, CacheSize);
        if NumOfBytesRead=-1 then begin
          exit;
        end;
        Assert(NumOfBytesRead>=0);
        FCacheEnd := FCacheStart+NumOfBytesRead;
        if NumOfBytesRead=0 then begin
          FLastError := ERROR_HANDLE_EOF;//must be at the end of the file
          break;
        end;
      end;

      //read from cache to Buffer
      NumOfBytesToCopy := Min(FCacheEnd-FPosition, NumOfBytesLeft);
      CachePtr := FCache;
      inc(CachePtr, FPosition-FCacheStart);
      Move(CachePtr^, BufferPtr^, NumOfBytesToCopy);
      inc(Result, NumOfBytesToCopy);
      inc(FPosition, NumOfBytesToCopy);
      inc(BufferPtr, NumOfBytesToCopy);
      dec(NumOfBytesLeft, NumOfBytesToCopy);
    end;
  end;
end;

{ TWriteCachedFileStream }

constructor TWriteCachedFileStream.Create(const FileName: string; CacheSize, ReadStreamCacheSize: Integer; ReadStreamUseAlignedCache: Boolean);
begin
  inherited Create(FileName, CacheSize);
  FReadStreamCacheSize := ReadStreamCacheSize;
  FReadStreamUseAlignedCache := ReadStreamUseAlignedCache;
end;

destructor TWriteCachedFileStream.Destroy;
begin
  FlushCache;//make sure that the final calls to Write get recorded in the file
  FreeAndNil(FReadStream);
  inherited;
end;

function TWriteCachedFileStream.CreateHandle(FlagsAndAttributes: DWORD): THandle;
begin
  Result := Windows.CreateFile(
    PChar(FFileName),
    GENERIC_READ or GENERIC_WRITE,
    0,
    nil,
    CREATE_ALWAYS,
    FlagsAndAttributes,
    0
  );
  if Result=INVALID_HANDLE_VALUE then begin
    RaiseSystemErrorFmt('Cannot create %s.', [FFileName]);
  end;
end;

procedure TWriteCachedFileStream.DisableStreamReadCache;
begin
  CreateReadStream;
  FReadStream.DisableStreamReadCache;
end;

procedure TWriteCachedFileStream.EnableStreamReadCache;
begin
  Assert(Assigned(FReadStream));
  FReadStream.EnableStreamReadCache;
end;

function TWriteCachedFileStream.GetFileSize: Int64;
begin
  Result := FFileSize;
end;

procedure TWriteCachedFileStream.CreateReadStream;
begin
  if not Assigned(FReadStream) then begin
    FReadStream := TReadOnlyCachedFileStream.Create(FFileName, FReadStreamCacheSize, FHandle);
    FReadStream.UseAlignedCache := FReadStreamUseAlignedCache;
  end;
end;

procedure TWriteCachedFileStream.FlushCache;
var
  NumOfBytesToWrite: Longint;
begin
  if Assigned(FCache) then begin
    NumOfBytesToWrite := FCacheEnd-FCacheStart;
    if NumOfBytesToWrite>0 then begin
      FileSeek(FCacheStart, soBeginning);
      if FileWrite(FCache^, NumOfBytesToWrite)<>NumOfBytesToWrite then begin
        RaiseSystemErrorFmt('FileWrite failed for %s.', [FFileName]);
      end;
      if Assigned(FReadStream) then begin
        FReadStream.FlushCache;
      end;
    end;
    FCacheStart := FPosition;
    FCacheEnd := FPosition;
  end;
end;

function TWriteCachedFileStream.Read(var Buffer; Count: Integer): Longint;
begin
  FlushCache;
  CreateReadStream;
  Assert(FReadStream.FViewStart=0);
  if FReadStream.FViewLength<>FFileSize then begin
    FReadStream.SetViewWindow(0, FFileSize);
  end;
  FReadStream.Position := FPosition;
  Result := FReadStream.Read(Buffer, Count);
  inc(FPosition, Result);
end;

function TWriteCachedFileStream.Write(const Buffer; Count: Longint): Longint;
var
  NumOfBytesToCopy, NumOfBytesLeft: Longint;
  CachePtr, BufferPtr: PByte;
begin
  Result := 0;
  NumOfBytesLeft := Count;
  BufferPtr := @Buffer;
  while NumOfBytesLeft>0 do begin
    if ((FPosition<FCacheStart) or (FPosition>FCacheEnd))//the current position is outside the cache
    or (FPosition-FCacheStart=FCacheSize)//the cache is full
    then begin
      FlushCache;
      Assert(FCacheStart=FPosition);
    end;

    //write from Buffer to the cache
    NumOfBytesToCopy := Min(FCacheSize-(FPosition-FCacheStart), NumOfBytesLeft);
    CachePtr := FCache;
    inc(CachePtr, FPosition-FCacheStart);
    Move(BufferPtr^, CachePtr^, NumOfBytesToCopy);
    inc(Result, NumOfBytesToCopy);
    inc(FPosition, NumOfBytesToCopy);
    FCacheEnd := Max(FCacheEnd, FPosition);
    inc(BufferPtr, NumOfBytesToCopy);
    dec(NumOfBytesLeft, NumOfBytesToCopy);
  end;
  FFileSize := Max(FFileSize, FPosition);
end;

end.

Solution 3

The TFileStream class internally uses the CreateFile function which always uses a buffer to manage the file, unless which you specify the FILE_FLAG_NO_BUFFERING flag (be aware which you can't specify this flag directly using the TFileStream). for more information you can check these links

also you can try the TGpHugeFileStream which is part of the GpHugeFile unit from Primoz Gabrijelcic.

Solution 4

If you have this kind of code a lot:

while Stream.Position < Stream.Size do

You can optimize it by caching the FileStream.Size to a variable and it will speed up. Stream.Size uses three virtual function calls to find out the actual size.

Share:
16,599

Related videos on Youtube

Server Overflow
Author by

Server Overflow

References List of Delphi language features and version in which they were introduced/deprecated Should we finally move from Delphi to Lazarus? POLL: http://www.quiz-maker.com/QOLJI03 Goodbye Delphi! March 2020 headline: Delphi is about to fall out of the TIOBE index top 20 https://www.tiobe.com/tiobe-index/ Jan 2022 Delphi is climbing back towards Top10. Embarcadero did a good job with its Community license. My SO rule: I up vote any (half-decent) SO question that was down voted, and no reason was provided for the down vote !!!! The decline of StackOverflow: https://hackernoon.com/the-decline-of-stack-overflow-7cb69faa575d Randomly deleted questions on SO: https://sergworks.wordpress.com/2012/09/26/why-stackoverflow-sucks/ Delphi is 2nd most hated language. Congrats Embarcadero! https://stackoverflow.blog/2017/10/31/disliked-programming-languages/ Why Borland failed? The Borland Turbo languages where the Cat's Pajamas. Microsoft countered with the Quick languages. Borland made Turbo Pascal for Windows and with Objects and then made Delphi. Microsoft countered with Visual BASIC. Borland made Borland C++ and JBuilder. Microsoft countered with Visual C++ and Visual J++/J# and then later Visual C#. The free IDEs and Free compiler languages ate into Borland's sales. Eclipse, Netbeans, IntelliJ, BlueJ, Sublime Text, GNU C/C++, Apple XCode, FreePascal/Lazarus, Ruby/Ruby on Rails, Python, Code::Blocks, etc. In 2005 Microsoft introduce Visual Studio Express a free version of their development tools. Like Amiga, Borland had the superior technology, but cheaper/free alternatives undercut their sales. Mostly, it was the free and open source revolution that did Borland in. orionblastar

Updated on March 29, 2022

Comments

  • Server Overflow
    Server Overflow over 1 year

    I am working with large files and writing directly to disk is slow. Because the file is large I cannot load it in a TMemoryStream.

    TFileStream is not buffered so I want to know if there is a custom library that can offer buffered streams or should I rely only on the buffering offered by OS. Is the OS buffering reliable? I mean if the cache is full an old file (mine) might be flushed from cache in order to make room for a new file.

    My file is in the GB range. It contains millions of records. Unfortunately, the records are not of fix size. So, I have to do millions of readings (between 4 and 500 bytes). The reading (and the writing) is sequential. I don't jump up and down into the file (which I think is ideal for buffering).

    In the end, I have to write such file back to disk (again millions of small writes).


    David provided the his personal library that provides buffered disk access.

       Speed tests:
         Input file: 317MB.SFF
         Delphi stream: 9.84sec
         David's stream: 2.05sec
         ______________________________________
    
       More tests:
         Input file: input2_700MB.txt
         Lines: 19 millions
         Compiler optimization: ON
         I/O check: On
         FastMM: release mode
         **HDD**   
    
         Reading: **linear** (ReadLine) (PS: multiply time with 10)      
          We see clear performance drop at 8KB. Recommended 16 or 32KB
            Time: 618 ms  Cache size: 64KB.
            Time: 622 ms  Cache size: 128KB.
            Time: 622 ms  Cache size: 24KB.
            Time: 622 ms  Cache size: 32KB.
            Time: 622 ms  Cache size: 64KB.
            Time: 624 ms  Cache size: 256KB.
            Time: 625 ms  Cache size: 18KB.
            Time: 626 ms  Cache size: 26KB.
            Time: 626 ms  Cache size: 1024KB.
            Time: 626 ms  Cache size: 16KB.
            Time: 628 ms  Cache size: 42KB.
            Time: 644 ms  Cache size: 8KB.      <--- no difference until 8K
            Time: 664 ms  Cache size: 4KB.
            Time: 705 ms  Cache size: 2KB.
            Time: 791 ms  Cache size: 1KB.
            Time: 795 ms  Cache size: 1KB.
    
          **SSD**
          We see a small improvement as we go towards higher buffers. Recommended 16 or 32KB
            Time: 610 ms  Cache size: 128KB.
            Time: 611 ms  Cache size: 256KB.
            Time: 614 ms  Cache size: 32KB.
            Time: 623 ms  Cache size: 16KB.
            Time: 625 ms  Cache size: 66KB.
            Time: 639 ms  Cache size: 8KB.       <--- definitively not good with 8K
            Time: 660 ms  Cache size: 4KB.
         ______
    
         Reading: **Random** (ReadInteger) (100000 reads)
         SSD
           Time: 064 ms. Cache size: 1KB.   Count: 100000.  RAM: 13.27 MB         <-- probably the best buffer size for ReadInteger is 4bytes!
           Time: 067 ms. Cache size: 2KB.   Count: 100000.  RAM: 13.27 MB
           Time: 080 ms. Cache size: 4KB.   Count: 100000.  RAM: 13.27 MB
           Time: 098 ms. Cache size: 8KB.   Count: 100000.  RAM: 13.27 MB
           Time: 140 ms. Cache size: 16KB.  Count: 100000.  RAM: 13.27 MB
           Time: 213 ms. Cache size: 32KB.  Count: 100000.  RAM: 13.27 MB
           Time: 360 ms. Cache size: 64KB.  Count: 100000.  RAM: 13.27 MB
           Conclusion: don't use it for "random" reading   
    

    Update 2020:
    When reading sequentially, the new System.Classes.TBufferedFileStream seems to be 70% faster than the library presented above.

    • Andreas Rejbrand
      Andreas Rejbrand over 12 years
      Memory-mapped files?
    • Najem
      Najem over 12 years
      if the file is used only by your application you can think for storing your records in a data base
    • David Heffernan
      David Heffernan over 1 year
      I don't understand how any buffered stream implementation would differ by that much in performance. It should be limited by raw IO speeds. I suspect your benchmark is wrong.
    • Server Overflow
      Server Overflow over 1 year
      Hi David. I will test again and put the code online.
    • Server Overflow
      Server Overflow over 1 year
      @DavidHeffernan - I ran the test again. I put two identical files (45mb) on a USB stick. Disconnected the stick. Connected back. So Win does not have the files in cache. Each library uses its own file, also to make sure that when the second library reads, Windows will give data from its RAM cache. Conclusions: When reading the files first time, each library shows the same time (3.52 seconds (your) vs 3.51 (VCL)). However, on the second run (now data comes from Win cache not directly from disk), your lib needs 1.22 sec, while Delphi's library needs only 690ms.
    • Server Overflow
      Server Overflow over 1 year
      The code is: WHILE Stream.Read(xAnsiChar, 1) > 0 DO if Char = #32 then Inc(Count); –
    • David Heffernan
      David Heffernan over 1 year
      @server that's interesting. I guess there must be some inefficiency that shows itself over very small reads in a tight loop
  • Server Overflow
    Server Overflow over 12 years
    Hi RRuz. So you say that using an a custom (buffered) stream will not improve performance since TFileStream is buffered anyway.
  • David Heffernan
    David Heffernan over 12 years
    That's what RRUZ is implying, but in many cases it is simply not true. There is an overhead to calling ReadFile that becomes significant if you read small pieces at a time.
  • RRUZ
    RRUZ over 12 years
    @Altar, No, i' am not saying that, i say which the TFileStream uses buffer to hold the data, that works ok in most of cases. Now if you want improve the performance you can write from scratch a object (class) to access and write the file using a bigger buffer or use a class like the TGpHugeFileStream.
  • David Heffernan
    David Heffernan over 12 years
    TFileStream doesn't use a buffer. It's just a lightweight wrapper around ReadFile/WriteFile. Windows has file caches which these API routines benefit from.
  • RRUZ
    RRUZ over 12 years
    @David the Write and Read functions of the TFileStream call the WriteFile and ReadFile functions which uses the buffer which you pass as parameter.
  • David Heffernan
    David Heffernan over 12 years
    @RRUZ I don't think that counts as a buffer in this discussion!
  • Server Overflow
    Server Overflow over 12 years
    Yes. I read millions of small chunks of data from that large file.
  • David Heffernan
    David Heffernan over 12 years
    @Altar in a now deleted comment you state that you need read/write access. Is that correct?
  • Server Overflow
    Server Overflow over 12 years
    @David. Yes. (I updated my question and the comment was moved there).
  • Server Overflow
    Server Overflow over 12 years
    I do millions of readings in sequential manner. So, it looks like a buffered file will indeed help me.
  • David Heffernan
    David Heffernan over 12 years
    @Remy thanks for the edit, I always get the names of those functions wrong!
  • David Heffernan
    David Heffernan over 12 years
    @Altar you can replace that with raise EAssertionFailed.Create; or perhaps Assert(False);
  • David Heffernan
    David Heffernan over 12 years
    Don't create TBaseCachedFileStream, it's an abstract class. Instantiate TReadOnlyCachedFileStream when you are reading and TWriteCachedFileStream when you are writing.
  • Server Overflow
    Server Overflow over 12 years
    Lord sweet Jesus. Delphi class: 11.2 seconds. Your class: 1.6 seconds.
  • David Heffernan
    David Heffernan over 12 years
    @Altar It should be a reasonable drop in replacement for a TFileStream, but there may be missing functionality. It works in my setting. You'd do well to read the code at some point!!
  • David Heffernan
    David Heffernan over 12 years
    @Altar Hmm, that sounds like quite a decent result!
  • Conrad Hildebrand
    Conrad Hildebrand over 12 years
    @David: just found this Trim(Msg+' ') in your code. What's that supposed to do? Interesting code btw!
  • David Heffernan
    David Heffernan over 12 years
    @Smasher Good catch. Bracket in the wrong place. I've just checked in this: Trim(Msg+' '+GetSystemErrorString(LastError)). Glad you like the code!!
  • Conrad Hildebrand
    Conrad Hildebrand over 12 years
    @David: thanks for the quick response. Can I ask you one more question? What's the DisableStreamReadCache interface used for?
  • David Heffernan
    David Heffernan over 12 years
    @Smasher Mostly I use the read stream to read relatively large chunks of the file at once. However, there is once use case where I read very small chunks and then seek some distance (larger than the buffer size) before I read the next small chunk. For this use case I don't want to pay the price of reading the whole buffer just for a couple of bytes.
  • LU RD
    LU RD over 10 years
    @David, GetSystemErrorString(LastError) ?? You mean SysErrorMessage(LastError) probably.
  • Andriy M
    Andriy M over 10 years
    @LURD: Apparently you are not first to complain about it. :)
  • LU RD
    LU RD over 10 years
    @AndriyM, there is a clean compilable version at Embarcadero attachments, forums.embarcadero.com/thread.jspa?threadID=87501&tstart=0. When I have the time I will compare this routine with my own buffered file access written 25 years ago in TP. It is still operating both in TP and Delphi with some minor adjustments.
  • Andriy M
    Andriy M over 10 years
    @LURD: Thank you for the link! Out of curiosity, are you expecting the results to be comparable or is it merely assessment of the inevitable difference that you are after? I mean, what with the amount of time passed, I'd expect even most advanced old-school methods to be out of league today.
  • LU RD
    LU RD over 10 years
    @AndriyM, BlockRead calls Windows.ReadFile more or less the same way as here, so unless there is a hidden trick, I don't expect much difference.
  • David Heffernan
    David Heffernan over 10 years
    @AndriyM There's no rocket science here at all. File buffering optimisations existed long before I was born and looked pretty much like this. In fact I suspect this code is about as simple as you can get.
  • David Heffernan
    David Heffernan over 10 years
    @All Thanks for prompting me to fix this code so that it compiles as a standalone unit.
  • Arnaud Bouchez
    Arnaud Bouchez over 10 years
    What is slow is the fact of calling the Windows file API, not using virtual methods. Implement a cache is the solution.
  • mistertodd
    mistertodd about 10 years
    Thanks for this; especially including an entire functional unit. Having the already complete drop-in replacement is quite handy. It even works in Delphi 7!
  • David Heffernan
    David Heffernan about 10 years
    @Ian Thanks. It was culled from Delphi 6 code. May even work in your beloved D5!
  • mistertodd
    mistertodd about 10 years
    Few tweaks to make it work in D5! StreamOrigin enumeration vs older constants, and the D6 overloads taking Int64 sizes. I will be adapting your excellent code into a generic TBufferedStream, so i can buffer more things besides just files (or have control to specify SHARE_DENY_NONE when opening a file).
  • David Heffernan
    David Heffernan about 10 years
    @Ian Nice to hear that. Just be glad you are still on D5!! I'm have to install some hooks to fix performance of XE3 streams. In WriteBuffer and ReadBuffer the code now copies the buffer into dynamically allocated one. I despair of those Emba devs sometimes.
  • Alex
    Alex over 8 years
    @ArnaudBouchez, Size is not simple virtual method. It calls kernel32.FileSeek 3 times! Add Postion (which also calls FileSeek) - and you get 4 kernel calls in just one code line. That's four time greater than kernel calls inside Read/Write.
  • Arnaud Bouchez
    Arnaud Bouchez over 8 years
    @Alex Indeed. This was exactly my point - please read again my comment. And this is why I proposed to cache the size, and even more compute the current position within the loop when writing the content. Avoiding as much API calls as possible is always a good idea!
  • Marco van de Voort
    Marco van de Voort almost 8 years
    It works and it isn't exactly subtle. From 3 seconds down to .6, factor 5, thank you very much :-)
  • Marco van de Voort
    Marco van de Voort almost 8 years
    FYI Delphi's Tmemorystream isn't too optimal either because it always grows in 8k increments. The FPC one grows exponentially till the increment is a certain size. Overriding Delphi's tmemorystream.realloc with a adapted FPC one can speed up with large writes due to fewer reallocations.
  • crazy_in_love
    crazy_in_love over 7 years
    Incorporated this to my code and I'm so glad I did it. Any chance to put in Github so it's properly taken care of?
  • Server Overflow
    Server Overflow over 7 years
    David's BufferedFileStream already implements buffered disk access.
  • dummzeuch
    dummzeuch over 6 years
    Am I the only one who ever tried to use the TWriteCachedFileStream with random access and noticed that it doesn't work correctly? I get chunks with zeroes where there shouldn't be any.
  • David Heffernan
    David Heffernan over 6 years
    Make a MCVE and I'll take a look
  • dummzeuch
    dummzeuch over 6 years
    @DavidHeffernan My fault: I didn't realise that it opens existing files with CREATE_ALWAYS, so the existing content gets lost. I'll do another test where I pass it an existing handle and see what happens. EDIT: Just did this test and the problem is gone.
  • Toby
    Toby over 6 years
    @David, you mentioned XE3 streams in a comment. Is that still true in XE8 and above? If so, what needs to be changed in the above code unit? And TYVM for this code!
  • Toby
    Toby over 6 years
    @David Thanks! I assume you actually mean NO. :-) Another question: If I have a large file but I only need to read and modify 100 characters and the start and end of the file - does it make any sense to use this buffered stream or should I just use a "normal" stream?
  • David Heffernan
    David Heffernan over 6 years
    @Toby That's a pretty light usage. Opening the file is the main cost there. Plain file stream with seeking should be fine.
  • Marco van de Voort
    Marco van de Voort over 6 years
    I've now used it for a year and found a regression. The pattern was a combination of a few very large (several MB) and a (possibly large, say 20000) number small writes. If for some reason the number of small writes was low, then the unnecessary buffering slowed. Probably can be fixed with a flush followed by a direct write for incoming writes > 1MB or so.
  • Rudi
    Rudi over 4 years
    @David It looks like a miracle. By changing my code from TFileStream to TReadOnlyCachedFileStream, reduces the time reading a particular file from 7,600 ms to 50 ms ms. Unreal!
  • David Heffernan
    David Heffernan over 4 years
    @rigel No, classes are default initialised
  • user382591
    user382591 over 3 years
    Is it possible to use a TReadOnlyCachedFileStream in a thread ? Should I use coinitializeEx(....) because of the interface IDisableStreamReadCache ? If yes with which parameters?
  • David Heffernan
    David Heffernan over 3 years
    @user382591 use in a thread is no problem. No need to initialize COM because we aren't using COM.