How to handle processing large csv file or read large CSV file in chunks
Solution 1
The enhanced for loop (for (MyObject myObject : myObjects)
) is implemented using the Iterator
(it requires that the instance returned by csv.parse(strat, getReader("file.txt"))
implements the Iterable
interface, which contains an iterator()
method that returns an Iterator
), so there's no performance difference between the two code snippets.
P.S
In the second snippet, don't use the raw Iterator
type, Use Iterator<MyObject>
:
Iterator<MyObject> myObjects = csv.parse(strat, getReader("file.txt")).iterator();
while (myObjects.hasNext()) {
MyObject myObject = myObjects.next();
System.out.println(myObject);
}
Solution 2
Reading a large csv file at once is not a good solution. Best way to read the csv file in chunks. You can have multiple threads one to read the data from the file and few other threads to perform the business logic. More details to read CSV data in chunks is here How to parse chunk by chunk a large CSV file and bulk insert to a database and have multiple threds solution here
Solution 3
"what is the difference between Iterator and list?"
A List is a data structure that gives the user functionalities like get(), toArray() etc.
An iterator only can allow the user to navigate through a data-structure provided the data structure implements Iterator interface (which all the data structures do)
so List<MyOption> myObjects = csv.parse(strat, getReader("file.txt"));
physically stores the data in myObjects
and Iterator myObjects = csv.parse(strat, getReader("file.txt")).iterator();
just uses the iterator functionality of csv.parse
Code Junkie
Updated on June 14, 2022Comments
-
Code Junkie about 2 years
I have very large csv files that I'm trying to iterate through. I'm using opencsv and I'd like to use CsvToBean so that I can dynamically set the column mappings from a database. The question I have is how to do this without grabbing the entire file and throwing it into a list. I'm trying to prevent memory errors.
I'm currently passing the entire result set into a list like so.
List<MyOption> myObjects = csv.parse(strat, getReader("file.txt")); for (MyObject myObject : myObjects) { System.out.println(myObject); }
But I found this iterator method and I'm wondering if this will just iterate each row rather than the entire file at once?
Iterator myObjects = csv.parse(strat, getReader("file.txt")).iterator(); while (myObjects.hasNext()) { MyObject myObject = (MyObject) myObjects.next(); System.out.println(myObject); }
So my question is what is the difference between Iterator and list?
-
Code Junkie almost 9 yearsSo by the sounds of it I'd need to use there iterator method and implement my own CSVToBean.
-
Code Junkie almost 9 yearsThanks for the tip, but it doesn't look like using iterator is going to resolve my memory issues :/
-
Eran almost 9 years@CodeJunkie The question is whether the
csv
instance you are using can supply anIterator
that doesn't require creation of a List first (since creation of a List requires reading all the data in advance). Such an Iterator (if exists) may read data from the file on demand (when you call thehasNext()
ornext()
method).