Spark Exception : Task failed while writing rows
Solution 1
Another possible reason is that you're hitting s3 request rate limits. If you look closely at your logs you may see something like this
AmazonS3Exception: Please reduce your request rate.
While the Spark UI will say
Task failed while writing rows
I doubt its the reason you're getting an issue, but its a possible reason if you're running a highly intensive job. So I included just for answer's completeness.
Solution 2
I found that disabling speculation prevents this error from happening. I'm not very sure why. It seems that speculative and non-speculative tasks are conflicting when writing parquet rows.
sparkConf.set("spark.speculation","false")
Solution 3
In my case, I saw this error when I tried to overwrite hdfs directory which belonged to a different user. Deleting the directory a letting my process write it from scratch solved it. So I guess, more digging is appropriate in direction of user permissions on hdfs.
Aditya Calangutkar
Updated on September 18, 2020Comments
-
Aditya Calangutkar over 3 years
I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, XXXX.XXX.XXX.local): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101) at parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94) at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) at org.apache.spark.sql.parquet.ParquetOutputWriter.<init>(newParquet.scala:83) at org.apache.spark.sql.parquet.ParquetRelation2$$anon$4.newInstance(newParquet.scala:229) at org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:470) at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:172) ... 8 more
I am trying to write the dataframe in following fashion :
dataframe.write().parquet(Path)
Any help is highly appreciated.
-
Aditya Calangutkar over 7 yearsIt was not a problem of speculation execution. It was a problem of schema not generating correctly and hence / by Zero
-
Carlos Bribiescas over 5 yearsI rewrote the code to hit s3 less unfortunately. Not a scalable solution... it maybe you can request that limit be increased?
-
Jérémy about 2 yearsNot a Spark >= 3.1 solution