Write spark dataframe to file using python and '|' delimiter

30,904

Solution 1

You can try to write to csv choosing a delimiter of |

df.write.option("sep","|").option("header","true").csv(filename)

This would not be 100% the same but would be close.

Alternatively you can collect to the driver and do it youself e.g.:

myprint(df.collect())

or

myprint(df.take(100))

df.collect and df.take return a list of rows.

Lastly you can collect to the driver using topandas and use pandas tools

Solution 2

In Spark 2.0+, you can use in-built CSV writer. Here delimiter is , by default and you can set it to |

df.write \
    .format('csv') \
    .options(delimiter='|') \
    .save('target/location')
Share:
30,904
Brian Waters
Author by

Brian Waters

Updated on January 27, 2020

Comments

  • Brian Waters
    Brian Waters over 4 years

    I have constructed a Spark dataframe from a query. What I wish to do is print the dataframe to a text file with all information delimited by '|', like the following:

    +-------+----+----+----+
    |Summary|col1|col2|col3|
    +-------+----+----+----+
    |row1   |1   |14  |17  |
    |row2   |3   |12  |2343|
    +-------+----+----+----+
    

    How can I do this?