c++ file io & splitting by separator

12,750

Solution 1

There is really nothing wrong with fscanf, which is probably the fastest solution in this case. And it's as short and readable as the python code:

FILE *fp = fopen("file.dat", "r");
int x, y, z;
std::vector<int> vx, vy, vz;

while (fscanf(fp, "%d, %d, %d", &x, &y, &z) == 3) {
  vx.push_back(x);
  vy.push_back(y);
  vz.push_back(z);
}
fclose(fp);

Solution 2

There's no real need to use boost in this example as streams will do the trick nicely:

int main(int argc, char* argv[])
{
    ifstream file(argv[1]);

    const unsigned maxIgnore = 10;
    const int delim = ',';
    int x,y,z;

    vector<int> vecx, vecy, vecz;

    while (file)
    {
        file >> x;
        file.ignore(maxIgnore, delim);
        file >> y;
        file.ignore(maxIgnore, delim);
        file >> z;

        vecx.push_back(x);
        vecy.push_back(y);
        vecz.push_back(z);
    }
}

Though if I were going to use boost I'd prefer the simplicity of tokenizer to regex... :)

Solution 3

Something like:

vector<int> inint;
vector<int> inbase;
vector<int> outbase;
while (fgets(buf, fh)) {
   char *tok = strtok(buf, ", ");
   inint.push_back(atoi(tok));
   tok = strtok(NULL, ", ");
   inbase.push_back(atoi(tok));
   tok = strtok(NULL, ", ");
   outbase.push_back(atoi(tok));
}

Except with error checking.

Solution 4

why not the same code as in python :) ?

std::ifstream file("input_hard.dat");
std::vector<int> inint, inbase, outbase;

while (file.good()){
    int val1, val2, val3;
    char delim;
    file >> val1 >> delim >> val2 >> delim >> val3;

    inint.push_back(val1);
    inbase.push_back(val2);
    outbase.push_back(val3);
}

Solution 5

If you don't mind using the Boost libraries...

#include <string>
#include <vector>
#include <boost/lexical_cast.hpp>
#include <boost/regex.hpp>

std::vector<int> ParseFile(std::istream& in) {
    const boost::regex cItemPattern(" *([0-9]+),?");
    std::vector<int> return_value;

    std::string line;
    while (std::getline(in, line)) {
        string::const_iterator b=line.begin(), e=line.end();
        boost::smatch match;
        while (b!=e && boost::regex_search(b, e, match, cItemPattern)) {
            return_value.push_back(boost::lexical_cast<int>(match[1].str()));
            b=match[0].second;
        };
    };

    return return_value;
}

That pulls the lines from the stream, then uses the Boost::RegEx library (with a capture group) to extract each number from the lines. It automatically ignores anything that isn't a valid number, though that can be changed if you wish.

It's still about twenty lines with the #includes, but you can use it to extract essentially anything from the file's lines. This is a trivial example, I'm using pretty much identical code to extract tags and optional values from a database field, the only major difference is the regular expression.

EDIT: Oops, you wanted three separate vectors. Try this slight modification instead:

const boost::regex cItemPattern(" *([0-9]+), *([0-9]+), *([0-9]+)");
std::vector<int> vector1, vector2, vector3;

std::string line;
while (std::getline(in, line)) {
    string::const_iterator b=line.begin(), e=line.end();
    boost::smatch match;
    while (b!=e && boost::regex_search(b, e, match, cItemPattern)) {
        vector1.push_back(boost::lexical_cast<int>(match[1].str()));
        vector2.push_back(boost::lexical_cast<int>(match[2].str()));
        vector3.push_back(boost::lexical_cast<int>(match[3].str()));
        b=match[0].second;
    };
};
Share:
12,750
darudude
Author by

darudude

Updated on July 29, 2022

Comments

  • darudude
    darudude almost 2 years

    I have a file with data listed as follows:

    0,       2,    10
    10,       8,    10
    10,       10,   10
    10,       16,   10
    15,       10,   16
    17,       10,   16
    

    I want to be able to input the file and split it into three arrays, in the process trimming all excess spaces and converting each element to integers.

    For some reason I can't find an easy way to do this in c++. The only success I've had is by inputting each line into an array, and then regexing out all the spaces and then splitting it up. This entire process took me a good 20-30 lines of code and its a pain to modify for say another separator(eg. space), etc.

    This is the python equivalent of what I would like to have in C++:

    f = open('input_hard.dat')
    lines =  f.readlines()
    f.close()
    
    #declarations
    inint, inbase, outbase = [], [], []
    
    #input parsing
    for line in lines:
        bits = string.split(line, ',')
        inint.append(int(bits[0].strip()))
        inbase.append(int(bits[1].strip()))
        outbase.append(int(bits[2].strip()))
    

    The ease of use of doing this in python is one of the reasons why I moved to it in the first place. However, I require to do this in C++ now and I would hate to have to use my ugly 20-30 line code.

    Any help would be appreciated, thanks!

  • MattyT
    MattyT over 15 years
    I would avoid such a "C-ish" solution for, well, aesthetics...but more importantly in this case because strtok has some serious thread-safe issues. Correct code though!
  • jbruni
    jbruni over 11 years
    Nice job. Folks forget about scanf's pattern matching. The simplest solution is the best.