How to handle processing large csv file or read large CSV file in chunks

10,090

Solution 1

The enhanced for loop (for (MyObject myObject : myObjects)) is implemented using the Iterator (it requires that the instance returned by csv.parse(strat, getReader("file.txt")) implements the Iterable interface, which contains an iterator() method that returns an Iterator), so there's no performance difference between the two code snippets.

P.S

In the second snippet, don't use the raw Iterator type, Use Iterator<MyObject> :

Iterator<MyObject> myObjects = csv.parse(strat, getReader("file.txt")).iterator();

while (myObjects.hasNext()) {
    MyObject myObject = myObjects.next();
    System.out.println(myObject);
}

Solution 2

Reading a large csv file at once is not a good solution. Best way to read the csv file in chunks. You can have multiple threads one to read the data from the file and few other threads to perform the business logic. More details to read CSV data in chunks is here How to parse chunk by chunk a large CSV file and bulk insert to a database and have multiple threds solution here

Solution 3

"what is the difference between Iterator and list?"

A List is a data structure that gives the user functionalities like get(), toArray() etc.

An iterator only can allow the user to navigate through a data-structure provided the data structure implements Iterator interface (which all the data structures do)

so List<MyOption> myObjects = csv.parse(strat, getReader("file.txt")); physically stores the data in myObjects

and Iterator myObjects = csv.parse(strat, getReader("file.txt")).iterator(); just uses the iterator functionality of csv.parse

Share:
10,090
Code Junkie
Author by

Code Junkie

Updated on June 14, 2022

Comments

  • Code Junkie
    Code Junkie about 2 years

    I have very large csv files that I'm trying to iterate through. I'm using opencsv and I'd like to use CsvToBean so that I can dynamically set the column mappings from a database. The question I have is how to do this without grabbing the entire file and throwing it into a list. I'm trying to prevent memory errors.

    I'm currently passing the entire result set into a list like so.

    List<MyOption> myObjects = csv.parse(strat, getReader("file.txt"));
    
    for (MyObject myObject : myObjects) {
        System.out.println(myObject);
    }
    

    But I found this iterator method and I'm wondering if this will just iterate each row rather than the entire file at once?

    Iterator myObjects = csv.parse(strat, getReader("file.txt")).iterator();
    
    while (myObjects.hasNext()) {
        MyObject myObject = (MyObject) myObjects.next();
        System.out.println(myObject);
    }
    

    So my question is what is the difference between Iterator and list?

  • Code Junkie
    Code Junkie almost 9 years
    So by the sounds of it I'd need to use there iterator method and implement my own CSVToBean.
  • Code Junkie
    Code Junkie almost 9 years
    Thanks for the tip, but it doesn't look like using iterator is going to resolve my memory issues :/
  • Eran
    Eran almost 9 years
    @CodeJunkie The question is whether the csv instance you are using can supply an Iterator that doesn't require creation of a List first (since creation of a List requires reading all the data in advance). Such an Iterator (if exists) may read data from the file on demand (when you call the hasNext() or next() method).