Read SAS sas7bdat data with Spark


Solution 1

It looks like the package was not imported correctly. You have to use --packages saurfang:spark-sas7bdat:2.0.0-s_2.10 when running spark-submit or pyspark. See:

You could also download the JAR file from that page, and run your pyspark or spark-submit command with --jars /path/to/jar

Solution 2

I've just sorted this issue out in R, and I bet my 2 cents it's the same issue here. The problem seems to be that the correct repository for is unavailable/not specified. It should be

We're all stumped because the error to the console here (and in R) is only slightly informative and relates to this part

Failed to find data source:

When I ran the R call to spark-submit in cmd (I'm using Windows), low-and-behold, it showed the attempts at trying to download the from various caches and repos, but not the one referenced above..

Thus fixing it all, resulted in this call and download of the spark package (using sparklyr in R)..

spark-submit2.cmd --driver-memory 32G --name sparklyr --class sparklyr.Shell --packages "saurfang:spark-sas7bdat:2.0.0-s_2.11" --repositories "...\sparklyr\java\sparklyr-2.0-2.11.jar" 8880 41823


Related videos on Youtube

Mateo Rod
Author by

Mateo Rod

Updated on June 04, 2022


  • Mateo Rod
    Mateo Rod almost 2 years

    I have a SAS table and I try to read it with Spark. I've try to use this like but I couldn't get it to work.

    Here is the code:

    from pyspark.sql import SQLContext
    sqlContext = SQLContext(sc)
    df ="").load("my_table.sas7bdat")

    It returns this error:

    Py4JJavaError: An error occurred while calling o878.load.
    : java.lang.ClassNotFoundException: Failed to find data source: Please find packages at
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at py4j.reflection.MethodInvoker.invoke(
    at py4j.reflection.ReflectionEngine.invoke(
    at py4j.Gateway.invoke(
    at py4j.commands.AbstractCommand.invokeMethod(
    at py4j.commands.CallCommand.execute(
    at Source)
    Caused by: java.lang.ClassNotFoundException:
    at Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)...

    Any ideas?

  • Tom
    Tom almost 5 years
    How is this an answer to the question? It looks more like a new topic to me.
  • Akki
    Akki almost 5 years
    It's seems like it only because you are required to create a text version for the corresponding sas7bdat file but in the end it does solve the issue of reading sas7bdat file with same schema. Simply reading the sas7bdat data through saurfang didn't work for me correctly but this does.
  • Etisha
    Etisha over 3 years
    Have you added any package to read a sas7bdat file?
  • Akki
    Akki over 3 years
    Yes, loading the saurfang package while initiating the spark-submit spark-submit --packages saurfang:spark-sas7bdat:2.0.0-s_2.10
  • Matthew Son
    Matthew Son over 2 years
    Thanks, this and helped me to solve the issue.