Save a spark RDD to the local file system using Java
14,195
saveAsTextFile
is able to take in local file system paths (e.g. file:///tmp/magic/...
). However, if your running on a distributed cluster, you most likely want to collect()
the data back to the cluster and then save it with standard file operations.
Author by
Kanav Sharma
Updated on June 19, 2022Comments
-
Kanav Sharma over 1 year
I have a RDD that is generated using Spark. Now if I write this RDD to a csv file, I am provided with some methods like "saveAsTextFile()" which outputs a csv file to the HDFS.
I want to write the file to my local file system so that my SSIS process can pick the files from the system and load them into the DB.
I am currently unable to use sqoop.
Is it somewhere possible in Java other than writing shell scripts to do that.
Any clarity needed, please let know.
-
Kanav Sharma over 8 yearsokay. this method of passing the parameter with "file:///" returns successfully with a _SUCCESS file but no output files could be seen. I am running it on a distributed cluster, however my data is so much that calling collect() limits the JVM
-
abalcerek over 8 yearsIf your file is too big for one machine this does not really make much sense to saive it locally instead of hdfs or other distributed file system.
-
Kanav Sharma over 8 yearsNot the file size but the files count is pretty much. My process is actually designed to handle around 400GB of data per hour. @holden I have, for now, managed to do this using FileSystem.copyToLocalFile(). I have to check it for a day for reliability and I would have more information.
-
Kanav Sharma over 8 years@holden Let me know if the approach I am on needs modification.
-
Holden over 8 yearsIf your data is too big for the driver, then you will need to either store the data to HDFS (or similar distributed file system) - or if you still really want to store it on the driver then using toLocalIterator (but remember to cache the RDD before hand) will only need as much memory as the largest partition.
-
user239558 about 8 yearsMissing the code to save this using standard file operations in this answer.