How to create a Row from a List or Array in Spark using java
21,316
Solution 1
We often need to create Datasets or Dataframes in real world applications. Here is an example of how to create Rows and Dataset in a Java application:
// initialize first SQLContext
SQLContext sqlContext = ...
StructType schemata = DataTypes.createStructType(
new StructField[]{
createStructField("NAME", StringType, false),
createStructField("STRING_VALUE", StringType, false),
createStructField("NUM_VALUE", IntegerType, false),
});
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
+-----+------------+---------+
| NAME|STRING_VALUE|NUM_VALUE|
+-----+------------+---------+
|name1| value1| 1|
|name2| value2| 2|
+-----+------------+---------+
Solution 2
I am not sure if I get your question correctly but you can use the RowFactory to create Row from ArrayList in java.
List<MyData> mlist = new ArrayList<MyData>();
mlist.add(d1);
mlist.add(d2);
Row row = RowFactory.create(mlist.toArray());
Author by
user2736706
Updated on July 09, 2022Comments
-
user2736706 over 1 year
In Java, I use RowFactory.create() to create a Row:
Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));
where "record" is a record from a database, but I cannot know the length of "record" in advance, so I want to use a List or an Array to create the "row". In Scala, I can use Row.fromSeq() to create a Row from a List or an Array, but how can I achieve that in Java?
-
user2736706 over 7 yearshi, when I use your method, I found spark regard mlist as a whole object:
Row row = RowFactory.create(mlist);
System.out.println("row number:" + row.length());
System.out.println("mlist number:" + mlist.size());
I got: row number:1 mlist number:2 -
abaghel over 7 yearsYes but Row will have both records.You can try printing System.out.println("row number:" + row.toSeq());
-
user2736706 over 7 yearshi, thanks so much! And you can try this: Object[] rowArray = {obj1, obj2, ....} Row row = RowFactory.create(rowArray); System.out.println("row number:" + row.length()); You will get - row number:6
-
abaghel over 7 yearsThanks. I updated my answer. I checked the source code for RowFactory and GenericRow class.-"An internal row implementation that uses an array of objects as the underlying storage."
-
BdEngineer almost 5 years@thank you , in scala we will do sc.paralallize(List((x,y),(a,b))).toDF("col1","col2"), it is so simple , why these Row , JavaRDD and etc ? any simple way like that ?
-
Borja over 4 yearsYou are saying you need to create Dataset in real world applications and making a hard definition of the all variables. Does not make any sense. In the real world everything has to be parameterizable and beforehand you do not know the values.