How to concatenate a string and a column in a dataframe in spark?

16,591

Solution 1

**Please refer to below Scala code for string concat in prefix and postfix way.**


import org.apache.spark.sql.functions._
val empDF =  MongoSpark.load(spark, readConfig) //dataframe empDF is loaded from Mongo DB using MongoSpark 

val prefixVal= "PrefixArkay " //variable string
val postfixVal= " PostfixArkay"

//Prefix
val finalPreDF = ipDF.withColumn("EMP", concat(lit(prefix),empDF.col("EMP")) )
println("finalPreDF.show-> " + finalPreDF.show())

//Output will be as below
+-------------------+
|                EMP|
+-------------------+
|PrefixArkay DineshS|
|+------------------+


val finalPostDF = ipDF.withColumn("EMP", concat(empDF.col("EMP"),lit(postfixVal)) )
println("finalPostDF.show-> " + finalPostDF .show())

//Output will be as below
+--------------------+
|                 EMP|
+--------------------+
|DineshS PostfixArkay|
|+-------------------+

Solution 2

You can do it like below.

import java.time.LocalDate

val df = Seq(("05:00"), ("22:30")).toDF("time")
df.show
val todaydate = LocalDate.now().toString()
val df2 = df.select(concat(lit(todaydate+ " "),df.col("time"))).toDF("datetime");
df2.show

This will give you

+----------------+
|        datetime|
+----------------+
|2016-12-10 05:00|
|2016-12-10 22:30|
+----------------+
Share:
16,591
Dasarathy D R
Author by

Dasarathy D R

Updated on August 21, 2022

Comments

  • Dasarathy D R
    Dasarathy D R over 1 year

    I have today's date as a string. I need to concatenate it with a time value that is present as a column in a dataframe.

    When I try this, I get String Index out of bounds exception.

    My code:

    val todaydate = LocalDate.now().toString()
    println(todaydate)  // o/p: 2016-12-10
    
    val todayrec_cutoff = todaydate + (" ") + df.col("colname")
    

    Expected Output:

    2016-12-10 05:00 
    2016-12-10 22:30
    
  • earl
    earl over 4 years
    Is lit not available in Python 2.7 ?