How to read file content into istringstream?

108,008

Solution 1

std::ifstream has a method rdbuf(), that returns a pointer to a filebuf. You can then "push" this filebuf into your stringstream:

#include <fstream>
#include <sstream>

int main()
{
    std::ifstream file( "myFile" );

    if ( file )
    {
        std::stringstream buffer;

        buffer << file.rdbuf();

        file.close();

        // operations on the buffer...
    }
}

EDIT: As Martin York remarks in the comments, this might not be the fastest solution since the stringstream's operator<< will read the filebuf character by character. You might want to check his answer, where he uses the ifstream's read method as you used to do, and then set the stringstream buffer to point to the previously allocated memory.

Solution 2

OK. I am not saying this will be quicker than reading from the file

But this is a method where you create the buffer once and after the data is read into the buffer use it directly as the source for stringstream.

N.B.It is worth mentioning that the std::ifstream is buffered. It reads data from the file in (relatively large) chunks. Stream operations are performed against the buffer only returning to the file for another read when more data is needed. So before sucking all data into memory please verify that this is a bottle neck.

#include <fstream>
#include <sstream>
#include <vector>

int main()
{
    std::ifstream       file("Plop");
    if (file)
    {
        /*
         * Get the size of the file
         */
        file.seekg(0,std::ios::end);
        std::streampos          length = file.tellg();
        file.seekg(0,std::ios::beg);

        /*
         * Use a vector as the buffer.
         * It is exception safe and will be tidied up correctly.
         * This constructor creates a buffer of the correct length.
         *
         * Then read the whole file into the buffer.
         */
        std::vector<char>       buffer(length);
        file.read(&buffer[0],length);

        /*
         * Create your string stream.
         * Get the stringbuffer from the stream and set the vector as it source.
         */
        std::stringstream       localStream;
        localStream.rdbuf()->pubsetbuf(&buffer[0],length);

        /*
         * Note the buffer is NOT copied, if it goes out of scope
         * the stream will be reading from released memory.
         */
    }
}

Solution 3

This seems like premature optimization to me. How much work is being done in the processing. Assuming a modernish desktop/server, and not an embedded system, copying a few MB of data during intialization is fairly cheap, especially compared to reading the file off of disk in the first place. I would stick with what you have, measure the system when it is complete, and the decide if the potential performance gains would be worth it. Of course if memory is tight, this is in an inner loop, or a program that gets called often (like once a second), that changes the balance.

Share:
108,008
Marcos Bento
Author by

Marcos Bento

A Worker, a student, a thinker...

Updated on July 09, 2022

Comments

  • Marcos Bento
    Marcos Bento almost 2 years

    In order to improve performance reading from a file, I'm trying to read the entire content of a big (several MB) file into memory and then use a istringstream to access the information.

    My question is, which is the best way to read this information and "import it" into the string stream? A problem with this approach (see bellow) is that when creating the string stream the buffers gets copied, and memory usage doubles.

    #include <fstream>
    #include <sstream>
    
    using namespace std;
    
    int main() {
      ifstream is;
      is.open (sFilename.c_str(), ios::binary );
    
      // get length of file:
      is.seekg (0, std::ios::end);
      long length = is.tellg();
      is.seekg (0, std::ios::beg);
    
      // allocate memory:
      char *buffer = new char [length];
    
      // read data as a block:
      is.read (buffer,length);
    
      // create string stream of memory contents
      // NOTE: this ends up copying the buffer!!!
      istringstream iss( string( buffer ) );
    
      // delete temporary buffer
      delete [] buffer;
    
      // close filestream
      is.close();
    
      /* ==================================
       * Use iss to access data
       */
    
    }
    
  • Marcos Bento
    Marcos Bento over 15 years
    Hi Luc, I agreed with your suggestion... the manipulation of the rdbuf is the way to go! But doens't your solution have the same problem? Don't you create 2 copies of the same buffer, at least momentarily?
  • Martin York
    Martin York over 15 years
    Because by the time operator<<() sees the result of rdbuf() it is just a stream buffer, no concept of a file buffer at this point, it can not look up its length and thus must use a loop to read 1 char at a time. Also stringstream internal buffer (std::string) must be resized as data as inserted.
  • Ramadheer Singh
    Ramadheer Singh almost 14 years
    @Martin York, how do you learn these details, do you read or you research when you encounter a problem and in turn you learn all these details? Thanks so much, bdw.
  • Martin York
    Martin York almost 14 years
    @Gollum: No this is just details gained from two areas. 1) Using the stream classes all the time. 2) Having implemented my own stream classes. Number (2) makes you do a lot of reading about how the stream is supposed to work, because you want it to work the same way for your stream as it works for the standard streams (so that you can re-use the STL library functions for standard streams). The only non-intatve bit of the above is modifying how the stream buffer works.
  • Ramadheer Singh
    Ramadheer Singh almost 14 years
    Can you suggest a book or some resources, I want to understand the standard Template library in depth (not just using it, but how it actually works inside)
  • GManNickG
    GManNickG almost 14 years
    I don't think the bit about "Because char is a POD data type it is not initialized." is correct. The constructor actually has two arguments, the second being which value to initialize the elements with. It defaults to T() or char() in our case, meaning 0. So all the elements should be 0.
  • Yakov Galka
    Yakov Galka over 13 years
    -1, This method (basic_stringbuf::setbuf) is implementation-defined.
  • Martin York
    Martin York over 13 years
    @ybungalobill: Yes so. Implementation defined is not 'Undefined'
  • Yakov Galka
    Yakov Galka over 13 years
    @Martin: You're right that it's not "undefined behavior", but it is not portable, so I can't call it a "standard C++ solution".
  • Martin York
    Martin York over 13 years
    @ybungalobill: I don't understand how you reach that conclusion. From my reading of the documentation it will work as expected. See 27.5.2.4.2 which leads us to 27.7.1.3 to define how underflow() works (thus explaining that the buffer will be used as a source) and also 27.8.1.4 to show that showmany() will be used to tell if the buffer can be refilled from a stream source (or not).
  • Yakov Galka
    Yakov Galka over 13 years
    @Martin: I think that your mistake is that you think that setbuf should set the buffer pointers (eback, gptr, etc..). But the standard doesn't say so, so underflow is not connected. Let's discuss it here: stackoverflow.com/questions/4349778/…
  • Ben Voigt
    Ben Voigt over 13 years
    However, it would be relatively trivial to write a streambuf-derived class that does use the buffer provided.
  • Martin York
    Martin York over 13 years
    @Ben Vogit: We pulled this out into a seprate question: the-effect-of-basic-streambufsetbuf
  • Michele
    Michele about 7 years
    It seems to be removing new line characters, which I need.
  • artm
    artm over 3 years
    is file.close(); necessary ?
  • Geekoder
    Geekoder over 3 years
    @artm It is not mandatory, but it is best to close file handles as soon as you are done using them. Without the explicit call to close, the file would have been closed when file would have been destructed (at the end of its scope, so at the end of main). The best approach would probably be to limit the scope of file.