How to sort data in a CSV file using a particular field in Java?

21,915

Solution 1

I would use an ArrayList of ArrayList of String:

ArrayList<ArrayList<String>>

Each entry is one line, which is a list of strings. You initialize the list by:

List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>();

To get the nth line:

List<String> line = csvLines.get(n);

To sort you write a custom Comparator. In the Constructor of that comparator you can pass the field position used to sort.

The compare method then gets the String value on stored position and converts it to a primitive ava type depending on the position. E.g you know that at position 2 in the csv there is an Integer, then convert the String to an int. This is neccessary for corretcly sorting. You may also pass an ArrayList of Class to the constructor such that it knows which field is what type.
Then use String.compareTo() or Integer.compare(), depending on column position etc.

Edit example of working code:

List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>();
Comparator<ArrayList<String>> comp = new Comparator<ArrayList<String>>() {
    public int compare(ArrayList<String> csvLine1, ArrayList<String> csvLine2) {
        // TODO here convert to Integer depending on field.
        // example is for numeric field 2
        return Integer.valueOf(csvLine1.get(2)).compareTo(Integer.valueOf(csvLine2.get(2)));
    }
};
Collections.sort(csvLines, comp);

Solution 2

In Java 8 you can do

SortedMap<Integer, List<String>> collect = Files.lines(Paths.get(filename))
    .collect(Collectors.groupingBy(
                                l -> Integer.valueOf(l.split(",", 4)[2]), 
                                TreeMap::new, Collectors.toList()));

Note: comparing numbers as Strings is a bad idea as "100" < "2" might not be what you expect.

I would use a sorted multi-map. If you don't have one handy you can do this.

SortedMap<Integer, List<String>> linesByKey = new TreeMap<>();

public void addLine(String line) {
    Integer key = Integer.valueOf(line.split(",", 4));
    List<String> lines = linesByKey.get(key);
    if (lines == null)
         linesByKey.put(key, lines = new ArrayList<>());
    lines.add(line);
}

This will produce a collection of lines, sorted by the number where lines with duplicate numbers have a preserved order. e.g. if all the lines have the same number, the order is unchanged.

Share:
21,915
Srikanth Kandalam
Author by

Srikanth Kandalam

Updated on January 02, 2022

Comments

  • Srikanth Kandalam
    Srikanth Kandalam over 2 years

    I want to read a CSV file in Java and sort it using a particular column. My CSV file looks like this:

     ABC,DEF,11,GHI....
     JKL,MNO,10,PQR....
     STU,VWX,12,XYZ....
    

    Considering I want to sort it using the third column, my output should look like:

     JKL,MNO,10,PQR....
     ABC,DEF,11,GHI....
     STU,VWX,12,XYZ....
    

    After some research on what data structure to use to hold the data of CSV, people here suggested to use Map data structure with Integer and List as key and value pairs in this question:

     Map<Integer, List<String>>
     where the value, List<String> = {[ABC,DEF,11,GHI....], [JKL,MNO,10,PQR....],[STU,VWX,12,XYZ....]...}
     And the key will be an auto-incremented integer starting from 0.
    

    So could anyone please suggest a way to sort this Map using an element in the 'List' in Java? Also if you think this choice of data structure is bad, please feel free to suggest an easier data structure to do this.

    Thank you.

  • Vishy
    Vishy almost 10 years
    Catching an NPE is not generally a good idea, esp as there is no way this can happen reading String from a file.
  • ltalhouarne
    ltalhouarne almost 10 years
    The CSV file could be missing data for whatever reason, thus possibly creating a list of only 2 elements.
  • Vishy
    Vishy almost 10 years
    So you could get an IndexOutOfBoundsException ?
  • Bart Kiers
    Bart Kiers almost 10 years
    Sorting numerical values as strings will result in "10" to be placed before "2". EDIT, ah, I see Peter already mentioned this as a comment to the question, but I think it's worth mentioning here as well...
  • AlexWien
    AlexWien almost 10 years
    edited and changed the compare to work with numbers.
  • Srikanth Kandalam
    Srikanth Kandalam almost 10 years
    I am really new to this so when I tried your code, Eclipse points it as a type mismatch error saying 'change type of csvLines to List<ArrayList<String>>'. So can you please let me know if I am doing it right?
  • AlexWien
    AlexWien almost 10 years
    ah yes, use List<ArrayList<String> = new ArrayList<ArrayList<String>>, this is a weakness of java
  • Venkata Raju
    Venkata Raju almost 10 years
    Files.lines() has to be declared in try with resource try(Stream<String> lines = Files.lines(Paths.get(filePath)))
  • Srikanth Kandalam
    Srikanth Kandalam almost 10 years
    @lolkidoki 'Collections.sort()' cannot accept 'Llp' as an argument as 'Llp' is a List of Lists while the valid argument is just a List<T>. Could you please correct me if I am wrong?
  • Srikanth Kandalam
    Srikanth Kandalam almost 10 years
    I was trying to pass 'csvLines' which is of type List<ArrayList<String>> to 'Collections.sort()' but since sort accepts only List and not List<ArrayList<String>> as one of it's arguments. Could you please help me how to proceed?
  • AlexWien
    AlexWien almost 10 years
    sort accpets the list to be sorted and the comparator.so Collections.sort(csvLines, MyCustomComparator);
  • AlexWien
    AlexWien almost 10 years
    List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>(); Comparator<ArrayList<String>> comp = new Comparator<ArrayList<String>>() { public int compare(ArrayList<String> o1, ArrayList<String> o2) { // TODO Auto-generated method stub return 0; } }; Collections.sort(csvLines, comp);
  • AlexWien
    AlexWien almost 10 years
    The above comment shows an inlined Comparator, hoiwevcer i would recommend to create an MyCsvComparator class.
  • Srikanth Kandalam
    Srikanth Kandalam almost 10 years
    Thank you for the explanation. I will try it.