Python spark extract characters from dataframe
24,647
Use something like this:
df.withColumn('new_column', concat(df.Column.substr(1, 5),
lit('_'),
df.Column.substr(8, 1)))
This use the function substr and concat
Those functions will solve your problem.
Author by
Amanda C
Updated on July 09, 2022Comments
-
Amanda C almost 2 years
I have a dataframe in spark, something like this:
ID | Column ------ | ---- 1 | STRINGOFLETTERS 2 | SOMEOTHERCHARACTERS 3 | ANOTHERSTRING 4 | EXAMPLEEXAMPLE
What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:
ID | New Column ------ | ------ 1 | STRIN_F 2 | SOMEO_E 3 | ANOTH_S 4 | EXAMP_E
I can't use the following codem, because the values in the columns differ, and I don't want to split on a specific character, but on the 6th character:
import pyspark split_col = pyspark.sql.functions.split(DF['column'], ' ') newDF = DF.withColumn('new_column', split_col.getItem(0))
Thanks all!