Parse CSV to multiple/nested bean types with OpenCSV?
You can certainly achieve this with Super CSV.
You can use
CsvBeanReader
- which doesn't support indexed mapping, so you'll need to create a helper method in your bean in order to use itCsvDozerBeanReader
- supports indexed mapping out of the box, so will do exactly what you want (requires the recently released Super CSV 2.1.0)
Using CsvBeanReader
If you don't want to use Dozer and are able to modify your bean class, the easiest option is to add a dummy setter on your bean which CsvBeanReader
will use to populate the attributes. I'm assuming that your Person
and PersonAttribute
beans have a public no-args constructor and getters/setters defined for each field (that's required).
Add the following dummy setter to your Person
bean:
public void setAddAttribute(PersonAttribute attribute){
if (attribs == null){
attribs = new ArrayList<PersonAttribute>();
}
attribs.add(attribute);
}
Create a custom cell processor which will populate a PersonAttribute
with the appropriate key from the CSV header, and the value from the CSV column.
package org.supercsv.example;
import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.util.CsvContext;
/**
* Creates a PersonAttribute using the corresponding header as the key.
*/
public class ParsePersonAttribute extends CellProcessorAdaptor {
private final String[] header;
public ParsePersonAttribute(final String[] header) {
this.header = header;
}
public Object execute(Object value, CsvContext context) {
if( value == null ) {
return null;
}
PersonAttribute attribute = new PersonAttribute();
// columns start at 1
attribute.setKey(header[context.getColumnNumber() - 1]);
attribute.setValue((String) value);
return attribute;
}
}
I think the following example speaks mostly for itself, but here's a few things I should point out:
I had to use custom preferences, because your CSV had spaces that aren't part of the data
I had to assemble the field mapping and cell processor arrays dynamically, as your data has an unknown number of attributes (this setup isn't usually as complicated)
All of the field mappings for the attributes use
addAttribute
, which corresponds to thesetAddAttribute()
method we added to your beanI've used our custom cell processor to create a
PersonAttribute
bean for each attribute column
Here's the code:
package org.supercsv.example;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;
public class ReadWithCsvBeanReader {
private static final String CSV =
"firstname, lastname, dog_name, fav_hat, fav_color\n"
+ "bill,smith,fido,porkpie,blue\n"
+ "james,smith,rover,bowler,purple";
private static final String CSV2 =
"firstname, lastname, car_type, floor_number\n"
+ "tom, collins, ford, 14\n" + "jim, jones, toyota, 120";
// attributes start at element 2 of the header array
private static final int ATT_START_INDEX = 2;
// custom preferences required because CSV contains
spaces that aren't part of the data
private static final CsvPreference PREFS =
new CsvPreference.Builder(
CsvPreference.STANDARD_PREFERENCE)
.surroundingSpacesNeedQuotes(true).build();
public static void main(String[] args) throws IOException {
System.out.println("CsvBeanReader with first CSV input:");
readWithCsvBeanReader(new StringReader(CSV));
System.out.println("CsvBeanReader with second CSV input:");
readWithCsvBeanReader(new StringReader(CSV2));
}
private static void readWithCsvBeanReader(final Reader reader)
throws IOException {
ICsvBeanReader beanReader = null;
try {
beanReader = new CsvBeanReader(reader, PREFS);
final String[] header = beanReader.getHeader(true);
// set up the field mapping and processors dynamically
final String[] fieldMapping = new String[header.length];
final CellProcessor[] processors =
new CellProcessor[header.length];
for (int i = 0; i < header.length; i++) {
if (i < ATT_START_INDEX) {
// normal mappings
fieldMapping[i] = header[i];
processors[i] = new NotNull();
} else {
// attribute mappings
fieldMapping[i] = "addAttribute";
processors[i] =
new Optional(new ParsePersonAttribute(header));
}
}
Person person;
while ((person = beanReader.read(Person.class, fieldMapping,
processors)) != null) {
System.out.println(String.format(
"lineNo=%s, rowNo=%s, person=%s",
beanReader.getLineNumber(), beanReader.getRowNumber(),
person));
}
} finally {
if (beanReader != null) {
beanReader.close();
}
}
}
}
Output (I added toString()
methods to your beans):
CsvBeanReader with first CSV input:
lineNo=2, rowNo=2, person=Person [firstname=bill, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=fido], PersonAttribute [key=fav_hat, value=porkpie], PersonAttribute [key=fav_color, value=blue]]]
lineNo=3, rowNo=3, person=Person [firstname=james, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=rover], PersonAttribute [key=fav_hat, value=bowler], PersonAttribute [key=fav_color, value=purple]]]
CsvBeanReader with second CSV input:
lineNo=2, rowNo=2, person=Person [firstname=tom, lastname=collins, attribs=[PersonAttribute [key=car_type, value=ford], PersonAttribute [key=floor_number, value=14]]]
lineNo=3, rowNo=3, person=Person [firstname=jim, lastname=jones, attribs=[PersonAttribute [key=car_type, value=toyota], PersonAttribute [key=floor_number, value=120]]]
Using CsvDozerBeanReader
If you can't, or don't want to modify your bean, then I'd recommend using CsvDozerBeanReader
in the Super CSV Dozer Extension project, as it supports nested and indexed field mappings. Check out some examples of it being used here.
Below is an example using CsvDozerBeanReader
. You'll notice it's virtually identical to the CsvBeanReader
example, but:
it uses a different reader (duh!)
it uses indexed mapping, e.g.
attribs[0]
it sets up the mapping by calling
configureBeanMapping()
(instead of accepting an array of Strings on theread()
method likeCsvBeanReader
it also sets up some hints (more on this below)
Code:
package org.supercsv.example;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.dozer.CsvDozerBeanReader;
import org.supercsv.io.dozer.ICsvDozerBeanReader;
import org.supercsv.prefs.CsvPreference;
public class ReadWithCsvDozerBeanReader {
private static final String CSV =
"firstname, lastname, dog_name, fav_hat, fav_color\n"
+ "bill,smith,fido,porkpie,blue\n"
+ "james,smith,rover,bowler,purple";
private static final String CSV2 =
"firstname, lastname, car_type, floor_number\n"
+ "tom, collins, ford, 14\n"
+ "jim, jones, toyota, 120";
// attributes start at element 2 of the header array
private static final int ATT_START_INDEX = 2;
// custom preferences required because CSV contains spaces that aren't part of the data
private static final CsvPreference PREFS = new CsvPreference.Builder(CsvPreference.STANDARD_PREFERENCE)
.surroundingSpacesNeedQuotes(true).build();
public static void main(String[] args) throws IOException {
System.out.println("CsvDozerBeanReader with first CSV input:");
readWithCsvDozerBeanReader(new StringReader(CSV));
System.out.println("CsvDozerBeanReader with second CSV input:");
readWithCsvDozerBeanReader(new StringReader(CSV2));
}
private static void readWithCsvDozerBeanReader(final Reader reader) throws IOException {
ICsvDozerBeanReader beanReader = null;
try {
beanReader = new CsvDozerBeanReader(reader, PREFS);
final String[] header = beanReader.getHeader(true);
// set up the field mapping, processors and hints dynamically
final String[] fieldMapping = new String[header.length];
final CellProcessor[] processors = new CellProcessor[header.length];
final Class<?>[] hintTypes = new Class<?>[header.length];
for( int i = 0; i < header.length; i++ ) {
if( i < ATT_START_INDEX ) {
// normal mappings
fieldMapping[i] = header[i];
processors[i] = new NotNull();
} else {
// attribute mappings
fieldMapping[i] = String.format("attribs[%d]", i - ATT_START_INDEX);
processors[i] = new Optional(new ParsePersonAttribute(header));
hintTypes[i] = PersonAttribute.class;
}
}
beanReader.configureBeanMapping(Person.class, fieldMapping, hintTypes);
Person person;
while( (person = beanReader.read(Person.class, processors)) != null ) {
System.out.println(String.format("lineNo=%s, rowNo=%s, person=%s",
beanReader.getLineNumber(),
beanReader.getRowNumber(), person));
}
}
finally {
if( beanReader != null ) {
beanReader.close();
}
}
}
}
Output:
CsvDozerBeanReader with first CSV input:
lineNo=2, rowNo=2, person=Person [firstname=bill, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=fido], PersonAttribute [key=fav_hat, value=porkpie], PersonAttribute [key=fav_color, value=blue]]]
lineNo=3, rowNo=3, person=Person [firstname=james, lastname=smith, attribs=[PersonAttribute [key=dog_name, value=rover], PersonAttribute [key=fav_hat, value=bowler], PersonAttribute [key=fav_color, value=purple]]]
CsvDozerBeanReader with second CSV input:
lineNo=2, rowNo=2, person=Person [firstname=tom, lastname=collins, attribs=[PersonAttribute [key=car_type, value=ford], PersonAttribute [key=floor_number, value=14]]]
lineNo=3, rowNo=3, person=Person [firstname=jim, lastname=jones, attribs=[PersonAttribute [key=car_type, value=toyota], PersonAttribute [key=floor_number, value=120]]]
In putting together this example, I discovered a bug with CsvDozerBeanReader
in Super CSV 2.0.1 when you combine a cell processor (such as the one I created in the example above to parse each person attribute key/value), with indexed mapping such as:
"firstname","lastname","attribs[0]","attribs[1]"
I just released Super CSV 2.1.0 which fixes this. It turns out Dozer needs a hint configured for the indexed mapping to work properly. I'm not 100% sure why, as it's quite capable of creating each PersonAttribute
and adding it to the correct index when you get rid of the custom cell processor and use the following (deep) mapping:
"firstname","lastname","attribs[0].value","attribs[1].value"
I hope this helps :)
xref
Updated on June 13, 2022Comments
-
xref about 2 years
I have various CSVs that contain some standard columns and some completely random fields:
firstname, lastname, dog_name, fav_hat, fav_color bill,smith,fido,porkpie,blue james,smith,rover,bowler,purple firstname, lastname, car_type, floor_number tom, collins, ford, 14 jim, jones, toyota, 120
So I'm trying to parse those into Person.class beans, which holds firstname & lastname, then I have a second class called PersonAttribute.class to hold...whatever else.
The bastic outline of the two classes:
class Person { public String firstname; public String lastname; public List<PersonAttribute> attribs; } class PersonAttribute { public Person p; public String key; // header name, ex. 'car_type' public String value; // column value, ex. 'ford' }
I've been using the CsvToBean functions in opencsv:
public static List<Person> parseToBeans(File csvFile, HashMap<String, String> mapStrategy, Class beanClass) throws IOException { CSVReader reader = null; try { reader = new CSVReader(new BufferedReader(new FileReader(csvFile))); HeaderColumnNameTranslateMappingStrategy<Person> strategy = new HeaderColumnNameTranslateMappingStrategy<>(); strategy.setType(beanClass); strategy.setColumnMapping(mapStrategy); final CsvToBean<Person> csv = new CsvToBean<Person>() { @Override protected Object convertValue(String value, PropertyDescriptor prop) throws InstantiationException, IllegalAccessException { value = value.trim().replaceAll(" +", " "); return super.convertValue(value, prop); } }; return csv.parse(strategy, reader); } ...
However I'm not sure how to handle creating PersonAttribute.class beans while I'm parsing the csv for Person.class beans. I came across this post and am wondering if I need to switch to supercsv to easily handle what I'm trying to do?
-
xref about 11 yearsWow thanks for the detailed response, I really appreciate it. How much more difficult would it be if the 'standard columns' could appear anywhere in the csv, not always in the first two columns?
-
James Bassett about 11 yearsHmmm that depends. Can you tell from the header? It looks like 'attribute columns' have underscores, but that might be coincidence. It's only really possible if you a) know the format beforehand, or b) can tell from the header
-
xref about 11 yearsYes I'll always know by the header name, the user can define a column named 'banana' to be 'first_name' or 'title' etc so I'll always know where to map it when parsing to beans (if a column isn't required, it is automatically made an attribute). I am also able to edit my Person bean as you asked, would you still recommend waiting for the Dozer fix and go that route instead of vanilla CsvBeanReader?
-
James Bassett about 11 yearsAs long as you have some way to distinguish between standard fields and attributes - you can simply replace the condition in my code example (currently
if (i < ATT_START_INDEX)
) with the appropriate condition (e.g.if "firstname".equals(header[i]) || "lastname".equals(header[i]))
or something more dynamic). TheCsvDozerBeanReader
solution is neat, and probably the better solution if you can't possibly modify your bean. Otherwise theCsvBeanReader
solution will always be faster and doesn't need any extra dependencies. You can always try it out when it's released and decide then! -
James Bassett about 11 yearsok @xref I've updated the answer with the details for using
CsvDozerBeanReader
and released the fixed version of Super CSV. You can go to the project website to find out more.