Write spark dataframe to file using python and '|' delimiter

python apache-spark pyspark pyspark-sql

30,904

Solution 1

You can try to write to csv choosing a delimiter of |

df.write.option("sep","|").option("header","true").csv(filename)

This would not be 100% the same but would be close.

Alternatively you can collect to the driver and do it youself e.g.:

myprint(df.collect())

or

myprint(df.take(100))

df.collect and df.take return a list of rows.

Lastly you can collect to the driver using topandas and use pandas tools

Solution 2

In Spark 2.0+, you can use in-built CSV writer. Here delimiter is , by default and you can set it to |

df.write \
    .format('csv') \
    .options(delimiter='|') \
    .save('target/location')

30,904

Author by

Brian Waters

Updated on January 27, 2020

Comments

Brian Waters over 4 years
I have constructed a Spark dataframe from a query. What I wish to do is print the dataframe to a text file with all information delimited by '|', like the following:
```
+-------+----+----+----+
|Summary|col1|col2|col3|
+-------+----+----+----+
|row1   |1   |14  |17  |
|row2   |3   |12  |2343|
+-------+----+----+----+
```
How can I do this?

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

How to filter a python Spark DataFrame by date between two date format columns

LEFT and RIGHT function in PySpark SQL

pyspark, Compare two rows in dataframe

Whats is the correct way to sum different dataframe columns in a list in pyspark?

Remove an element from a Python list of lists in PySpark DataFrame

How divide or multiply every non-string columns of a PySpark dataframe with a float constant?

Merge two dataframes in PySpark

How to create a table as select in pyspark.sql

PySpark: Take average of a column after using filter function

writing a csv with column names and reading a csv file which is being generated from a sparksql dataframe in Pyspark