(null) entry in command string exception in saveAsTextFile() on Pyspark
Solution 1
You are missing winutils.exe
a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe
file & set your hadoop home pointing to it.
1st way :
- Download the file
- Create
hadoop
folder in Your System, exC:
- Create
bin
folder inhadoop
directory, ex :C:\hadoop\bin
- paste
winutils.exe
inbin
, ex:C:\hadoop\bin\winutils.exe
- In User Variables in System Properties -> Advance System Settings
Create New Variable
Name: HADOOP_HOME
Path: C:\hadoop\
2nd Way :
You can set hadoop home directly in Your Java Program with the following Command like this :
System.setProperty("hadoop.home.dir","C:\hadoop" );
Solution 2
I had a similar exception saying permission issue when loading a model built in some other machine and copied in my Windows system although my HADOOP_HOME
was set.
I just ran the following command over my model folder:
winutils.exe chmod -R 777 model-path
Admin
Updated on July 28, 2022Comments
-
Admin almost 2 years
I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type
pyspark.rdd.PipelinedRDD
calledidSums
. When attempting to executeidSums.saveAsTextFile("Output")
, I receive the following error:Py4JJavaError: An error occurred while calling o834.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001
There shouldn't be any problem with the RDD object, in my opinion, because I'm able to execute other actions without error, e.g. executing
idSums.collect()
produces the correct output.Furthermore, the
Output
directory is created (with all subdirectories) and the filepart-00001
is created, but it is 0 bytes.