Parse CSV to multiple/nested bean types with OpenCSV?

java opencsv supercsv

11,678

You can certainly achieve this with Super CSV.

You can use

CsvBeanReader - which doesn't support indexed mapping, so you'll need to create a helper method in your bean in order to use it
CsvDozerBeanReader - supports indexed mapping out of the box, so will do exactly what you want (requires the recently released Super CSV 2.1.0)

Using CsvBeanReader

If you don't want to use Dozer and are able to modify your bean class, the easiest option is to add a dummy setter on your bean which CsvBeanReader will use to populate the attributes. I'm assuming that your Person and PersonAttribute beans have a public no-args constructor and getters/setters defined for each field (that's required).

Add the following dummy setter to your Person bean:

public void setAddAttribute(PersonAttribute attribute){
    if (attribs == null){
        attribs = new ArrayList<PersonAttribute>();
    }
    attribs.add(attribute);
}

Create a custom cell processor which will populate a PersonAttribute with the appropriate key from the CSV header, and the value from the CSV column.

package org.supercsv.example;

import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.util.CsvContext;

/**
 * Creates a PersonAttribute using the corresponding header as the key.
 */
public class ParsePersonAttribute extends CellProcessorAdaptor {

    private final String[] header;

    public ParsePersonAttribute(final String[] header) {
        this.header = header;
    }

    public Object execute(Object value, CsvContext context) {

        if( value == null ) {
            return null;
        }

        PersonAttribute attribute = new PersonAttribute();
        // columns start at 1
        attribute.setKey(header[context.getColumnNumber() - 1]);
        attribute.setValue((String) value);
        return attribute;
    }

}

I think the following example speaks mostly for itself, but here's a few things I should point out:

I had to use custom preferences, because your CSV had spaces that aren't part of the data
I had to assemble the field mapping and cell processor arrays dynamically, as your data has an unknown number of attributes (this setup isn't usually as complicated)
All of the field mappings for the attributes use addAttribute, which corresponds to the setAddAttribute() method we added to your bean
I've used our custom cell processor to create a PersonAttribute bean for each attribute column

Here's the code:

package org.supercsv.example;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;

public class ReadWithCsvBeanReader {

    private static final String CSV = 
            "firstname, lastname, dog_name, fav_hat, fav_color\n"
            + "bill,smith,fido,porkpie,blue\n"
            + "james,smith,rover,bowler,purple";

    private static final String CSV2 = 
            "firstname, lastname, car_type, floor_number\n"
            + "tom, collins, ford, 14\n" + "jim, jones, toyota, 120";

    // attributes start at element 2 of the header array
    private static final int ATT_START_INDEX = 2;

    // custom preferences required because CSV contains 
    spaces that aren't part of the data
    private static final CsvPreference PREFS = 
        new CsvPreference.Builder(
            CsvPreference.STANDARD_PREFERENCE)
            .surroundingSpacesNeedQuotes(true).build();

    public static void main(String[] args) throws IOException {
        System.out.println("CsvBeanReader with first CSV input:");
        readWithCsvBeanReader(new StringReader(CSV));
        System.out.println("CsvBeanReader with second CSV input:");
        readWithCsvBeanReader(new StringReader(CSV2));
    }

    private static void readWithCsvBeanReader(final Reader reader)
            throws IOException {
        ICsvBeanReader beanReader = null;
        try {
            beanReader = new CsvBeanReader(reader, PREFS);

            final String[] header = beanReader.getHeader(true);

            // set up the field mapping and processors dynamically
            final String[] fieldMapping = new String[header.length];
            final CellProcessor[] processors = 
                    new CellProcessor[header.length];
            for (int i = 0; i < header.length; i++) {
                if (i < ATT_START_INDEX) {
                    // normal mappings
                    fieldMapping[i] = header[i];
                    processors[i] = new NotNull();
                } else {
                    // attribute mappings
                    fieldMapping[i] = "addAttribute";
                    processors[i] = 
                            new Optional(new ParsePersonAttribute(header));
                }
            }

            Person person;
            while ((person = beanReader.read(Person.class, fieldMapping,
                    processors)) != null) {
                System.out.println(String.format(
                        "lineNo=%s, rowNo=%s, person=%s",
                        beanReader.getLineNumber(), beanReader.getRowNumber(),
                        person));
            }

        } finally {
            if (beanReader != null) {
                beanReader.close();
            }
        }
    }

}

Output (I added toString() methods to your beans):

CsvBeanReader with first CSV input:
lineNo=2, rowNo=2, person=Person [firstname=bill, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=fido], PersonAttribute [key=fav_hat, value=porkpie], PersonAttribute [key=fav_color, value=blue]]]
lineNo=3, rowNo=3, person=Person [firstname=james, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=rover], PersonAttribute [key=fav_hat, value=bowler], PersonAttribute [key=fav_color, value=purple]]]
CsvBeanReader with second CSV input:
lineNo=2, rowNo=2, person=Person [firstname=tom, lastname=collins, attribs=[PersonAttribute [key=car_type, value=ford], PersonAttribute [key=floor_number, value=14]]]
lineNo=3, rowNo=3, person=Person [firstname=jim, lastname=jones, attribs=[PersonAttribute [key=car_type, value=toyota], PersonAttribute [key=floor_number, value=120]]]

Using CsvDozerBeanReader

If you can't, or don't want to modify your bean, then I'd recommend using CsvDozerBeanReader in the Super CSV Dozer Extension project, as it supports nested and indexed field mappings. Check out some examples of it being used here.

Below is an example using CsvDozerBeanReader. You'll notice it's virtually identical to the CsvBeanReader example, but:

it uses a different reader (duh!)
it uses indexed mapping, e.g. attribs[0]
it sets up the mapping by calling configureBeanMapping() (instead of accepting an array of Strings on the read() method like CsvBeanReader
it also sets up some hints (more on this below)

Code:

package org.supercsv.example;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.dozer.CsvDozerBeanReader;
import org.supercsv.io.dozer.ICsvDozerBeanReader;
import org.supercsv.prefs.CsvPreference;

public class ReadWithCsvDozerBeanReader {

    private static final String CSV = 
            "firstname, lastname, dog_name, fav_hat, fav_color\n"
            + "bill,smith,fido,porkpie,blue\n" 
            + "james,smith,rover,bowler,purple";

    private static final String CSV2 = 
            "firstname, lastname, car_type, floor_number\n" 
            + "tom, collins, ford, 14\n"
            + "jim, jones, toyota, 120";

    // attributes start at element 2 of the header array
    private static final int ATT_START_INDEX = 2;

    // custom preferences required because CSV contains spaces that aren't part of the data
    private static final CsvPreference PREFS = new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE)
        .surroundingSpacesNeedQuotes(true).build();

    public static void main(String[] args) throws IOException {
        System.out.println("CsvDozerBeanReader with first CSV input:");
        readWithCsvDozerBeanReader(new StringReader(CSV));
        System.out.println("CsvDozerBeanReader with second CSV input:");
        readWithCsvDozerBeanReader(new StringReader(CSV2));
    }

    private static void readWithCsvDozerBeanReader(final Reader reader) throws IOException {
        ICsvDozerBeanReader beanReader = null;
        try {
            beanReader = new CsvDozerBeanReader(reader, PREFS);

            final String[] header = beanReader.getHeader(true);

            // set up the field mapping, processors and hints dynamically
            final String[] fieldMapping = new String[header.length];
            final CellProcessor[] processors = new CellProcessor[header.length];
            final Class<?>[] hintTypes = new Class<?>[header.length];
            for( int i = 0; i < header.length; i++ ) {
                if( i < ATT_START_INDEX ) {
                    // normal mappings
                    fieldMapping[i] = header[i];
                    processors[i] = new NotNull();
                } else {
                    // attribute mappings
                    fieldMapping[i] = String.format("attribs[%d]", i - ATT_START_INDEX);
                    processors[i] = new Optional(new ParsePersonAttribute(header));
                    hintTypes[i] = PersonAttribute.class;
                }
            }

            beanReader.configureBeanMapping(Person.class, fieldMapping, hintTypes);

            Person person;
            while( (person = beanReader.read(Person.class, processors)) != null ) {
                System.out.println(String.format("lineNo=%s, rowNo=%s, person=%s", 
                    beanReader.getLineNumber(),
                    beanReader.getRowNumber(), person));
            }

        }
        finally {
            if( beanReader != null ) {
                beanReader.close();
            }
        }
    }

}

Output:

CsvDozerBeanReader with first CSV input:
lineNo=2, rowNo=2, person=Person [firstname=bill, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=fido], PersonAttribute [key=fav_hat, value=porkpie], PersonAttribute [key=fav_color, value=blue]]]
lineNo=3, rowNo=3, person=Person [firstname=james, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=rover], PersonAttribute [key=fav_hat, value=bowler], PersonAttribute [key=fav_color, value=purple]]]
CsvDozerBeanReader with second CSV input:
lineNo=2, rowNo=2, person=Person [firstname=tom, lastname=collins, attribs=[PersonAttribute [key=car_type, value=ford], PersonAttribute [key=floor_number, value=14]]]
lineNo=3, rowNo=3, person=Person [firstname=jim, lastname=jones, attribs=[PersonAttribute [key=car_type, value=toyota], PersonAttribute [key=floor_number, value=120]]]

In putting together this example, I discovered a bug with CsvDozerBeanReader in Super CSV 2.0.1 when you combine a cell processor (such as the one I created in the example above to parse each person attribute key/value), with indexed mapping such as:

"firstname","lastname","attribs[0]","attribs[1]"

I just released Super CSV 2.1.0 which fixes this. It turns out Dozer needs a hint configured for the indexed mapping to work properly. I'm not 100% sure why, as it's quite capable of creating each PersonAttribute and adding it to the correct index when you get rid of the custom cell processor and use the following (deep) mapping:

"firstname","lastname","attribs[0].value","attribs[1].value"

I hope this helps :)

11,678

Author by

xref

Updated on June 13, 2022

Comments

xref about 2 years

I have various CSVs that contain some standard columns and some completely random fields:

firstname, lastname, dog_name, fav_hat, fav_color
bill,smith,fido,porkpie,blue
james,smith,rover,bowler,purple


firstname, lastname, car_type, floor_number
tom, collins, ford, 14
jim, jones, toyota, 120

So I'm trying to parse those into Person.class beans, which holds firstname & lastname, then I have a second class called PersonAttribute.class to hold...whatever else.

The bastic outline of the two classes:

class Person {
 public String firstname;
 public String lastname;
 public List<PersonAttribute> attribs;
}

class PersonAttribute {
 public Person p;
 public String key; // header name, ex. 'car_type'
 public String value; // column value, ex. 'ford'
}

I've been using the CsvToBean functions in opencsv:

public static List<Person> parseToBeans(File csvFile, HashMap<String, String> mapStrategy, Class beanClass) throws IOException {
    CSVReader reader = null;
    try {
        reader = new CSVReader(new BufferedReader(new FileReader(csvFile)));

        HeaderColumnNameTranslateMappingStrategy<Person> strategy = new HeaderColumnNameTranslateMappingStrategy<>();
        strategy.setType(beanClass);
        strategy.setColumnMapping(mapStrategy);

        final CsvToBean<Person> csv = new CsvToBean<Person>() {
            @Override
            protected Object convertValue(String value, PropertyDescriptor prop) throws InstantiationException, IllegalAccessException {
                value = value.trim().replaceAll(" +", " ");
                return super.convertValue(value, prop);
            }
        };
        return csv.parse(strategy, reader);
    }
...

However I'm not sure how to handle creating PersonAttribute.class beans while I'm parsing the csv for Person.class beans. I came across this post and am wondering if I need to switch to supercsv to easily handle what I'm trying to do?

xref about 11 years

Wow thanks for the detailed response, I really appreciate it. How much more difficult would it be if the 'standard columns' could appear anywhere in the csv, not always in the first two columns?
James Bassett about 11 years

Hmmm that depends. Can you tell from the header? It looks like 'attribute columns' have underscores, but that might be coincidence. It's only really possible if you a) know the format beforehand, or b) can tell from the header
xref about 11 years

Yes I'll always know by the header name, the user can define a column named 'banana' to be 'first_name' or 'title' etc so I'll always know where to map it when parsing to beans (if a column isn't required, it is automatically made an attribute). I am also able to edit my Person bean as you asked, would you still recommend waiting for the Dozer fix and go that route instead of vanilla CsvBeanReader?
James Bassett about 11 years

As long as you have some way to distinguish between standard fields and attributes - you can simply replace the condition in my code example (currently if (i < ATT_START_INDEX)) with the appropriate condition (e.g. if "firstname".equals(header[i]) || "lastname".equals(header[i])) or something more dynamic). The CsvDozerBeanReader solution is neat, and probably the better solution if you can't possibly modify your bean. Otherwise the CsvBeanReader solution will always be faster and doesn't need any extra dependencies. You can always try it out when it's released and decide then!
James Bassett about 11 years

ok @xref I've updated the answer with the details for using CsvDozerBeanReader and released the fixed version of Super CSV. You can go to the project website to find out more.