Python spark extract characters from dataframe

24,647

Use something like this:

df.withColumn('new_column', concat(df.Column.substr(1, 5),
                                   lit('_'),
                                   df.Column.substr(8, 1)))

This use the function substr and concat

Those functions will solve your problem.

Share:
24,647
Amanda C
Author by

Amanda C

Updated on July 09, 2022

Comments

  • Amanda C
    Amanda C almost 2 years

    I have a dataframe in spark, something like this:

    ID     | Column
    ------ | ----
    1      | STRINGOFLETTERS
    2      | SOMEOTHERCHARACTERS
    3      | ANOTHERSTRING
    4      | EXAMPLEEXAMPLE
    

    What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:

    ID     | New Column
    ------ | ------
    1      | STRIN_F
    2      | SOMEO_E
    3      | ANOTH_S
    4      | EXAMP_E
    

    I can't use the following codem, because the values in the columns differ, and I don't want to split on a specific character, but on the 6th character:

    import pyspark
    split_col = pyspark.sql.functions.split(DF['column'], ' ')
    newDF = DF.withColumn('new_column', split_col.getItem(0))
    

    Thanks all!