How to sort data in a CSV file using a particular field in Java?
Solution 1
I would use an ArrayList
of ArrayList
of String
:
ArrayList<ArrayList<String>>
Each entry is one line, which is a list of strings. You initialize the list by:
List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>();
To get the nth line:
List<String> line = csvLines.get(n);
To sort you write a custom Comparator. In the Constructor of that comparator you can pass the field position used to sort.
The compare method then gets the String value on stored position and converts it to a primitive ava type depending on the position. E.g you know that at position 2 in the csv there is an Integer, then convert the String to an int. This is neccessary for corretcly sorting. You may also pass an ArrayList of Class to the constructor such that it knows which field is what type.
Then use String.compareTo()
or Integer.compare()
, depending on column position etc.
Edit example of working code:
List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>();
Comparator<ArrayList<String>> comp = new Comparator<ArrayList<String>>() {
public int compare(ArrayList<String> csvLine1, ArrayList<String> csvLine2) {
// TODO here convert to Integer depending on field.
// example is for numeric field 2
return Integer.valueOf(csvLine1.get(2)).compareTo(Integer.valueOf(csvLine2.get(2)));
}
};
Collections.sort(csvLines, comp);
Solution 2
In Java 8 you can do
SortedMap<Integer, List<String>> collect = Files.lines(Paths.get(filename))
.collect(Collectors.groupingBy(
l -> Integer.valueOf(l.split(",", 4)[2]),
TreeMap::new, Collectors.toList()));
Note: comparing numbers as Strings is a bad idea as "100" < "2"
might not be what you expect.
I would use a sorted multi-map. If you don't have one handy you can do this.
SortedMap<Integer, List<String>> linesByKey = new TreeMap<>();
public void addLine(String line) {
Integer key = Integer.valueOf(line.split(",", 4));
List<String> lines = linesByKey.get(key);
if (lines == null)
linesByKey.put(key, lines = new ArrayList<>());
lines.add(line);
}
This will produce a collection of lines, sorted by the number where lines with duplicate numbers have a preserved order. e.g. if all the lines have the same number, the order is unchanged.
Srikanth Kandalam
Updated on January 02, 2022Comments
-
Srikanth Kandalam over 2 years
I want to read a CSV file in Java and sort it using a particular column. My CSV file looks like this:
ABC,DEF,11,GHI.... JKL,MNO,10,PQR.... STU,VWX,12,XYZ....
Considering I want to sort it using the third column, my output should look like:
JKL,MNO,10,PQR.... ABC,DEF,11,GHI.... STU,VWX,12,XYZ....
After some research on what data structure to use to hold the data of CSV, people here suggested to use Map data structure with Integer and List as key and value pairs in this question:
Map<Integer, List<String>> where the value, List<String> = {[ABC,DEF,11,GHI....], [JKL,MNO,10,PQR....],[STU,VWX,12,XYZ....]...} And the key will be an auto-incremented integer starting from 0.
So could anyone please suggest a way to sort this Map using an element in the 'List' in Java? Also if you think this choice of data structure is bad, please feel free to suggest an easier data structure to do this.
Thank you.
-
Vishy almost 10 yearsCatching an NPE is not generally a good idea, esp as there is no way this can happen reading String from a file.
-
ltalhouarne almost 10 yearsThe CSV file could be missing data for whatever reason, thus possibly creating a list of only 2 elements.
-
Vishy almost 10 yearsSo you could get an IndexOutOfBoundsException ?
-
Bart Kiers almost 10 yearsSorting numerical values as strings will result in
"10"
to be placed before"2"
. EDIT, ah, I see Peter already mentioned this as a comment to the question, but I think it's worth mentioning here as well... -
AlexWien almost 10 yearsedited and changed the compare to work with numbers.
-
Srikanth Kandalam almost 10 yearsI am really new to this so when I tried your code, Eclipse points it as a type mismatch error saying 'change type of csvLines to List<ArrayList<String>>'. So can you please let me know if I am doing it right?
-
AlexWien almost 10 yearsah yes, use List<ArrayList<String> = new ArrayList<ArrayList<String>>, this is a weakness of java
-
Venkata Raju almost 10 years
Files.lines()
has to be declared in try with resourcetry(Stream<String> lines = Files.lines(Paths.get(filePath)))
-
Srikanth Kandalam almost 10 years@lolkidoki 'Collections.sort()' cannot accept 'Llp' as an argument as 'Llp' is a List of Lists while the valid argument is just a List<T>. Could you please correct me if I am wrong?
-
Srikanth Kandalam almost 10 yearsI was trying to pass 'csvLines' which is of type List<ArrayList<String>> to 'Collections.sort()' but since sort accepts only List and not List<ArrayList<String>> as one of it's arguments. Could you please help me how to proceed?
-
AlexWien almost 10 yearssort accpets the list to be sorted and the comparator.so Collections.sort(csvLines, MyCustomComparator);
-
AlexWien almost 10 yearsList<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>(); Comparator<ArrayList<String>> comp = new Comparator<ArrayList<String>>() { public int compare(ArrayList<String> o1, ArrayList<String> o2) { // TODO Auto-generated method stub return 0; } }; Collections.sort(csvLines, comp);
-
AlexWien almost 10 yearsThe above comment shows an inlined Comparator, hoiwevcer i would recommend to create an MyCsvComparator class.
-
Srikanth Kandalam almost 10 yearsThank you for the explanation. I will try it.