"TypeError: an integer is required (got type bytes)" when importing pyspark on Python 3.8

11,371

you must downgrade your python version from 3.8 to 3.7 because pyspark doesn't support this version of python.

Share:
11,371

Related videos on Youtube

Dmitry Deryabin
Author by

Dmitry Deryabin

Updated on June 04, 2022

Comments

  • Dmitry Deryabin
    Dmitry Deryabin over 1 year
    1. Created a conda environment:
    conda create -y -n py38 python=3.8
    conda activate py38
    
    1. Installed Spark from Pip:
    pip install pyspark
    # Successfully installed py4j-0.10.7 pyspark-2.4.5
    
    1. Try to import pyspark:
    python -c "import pyspark"
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
        from pyspark.context import SparkContext
      File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
        from pyspark import accumulators
      File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
        from pyspark.serializers import read_int, PickleSerializer
      File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
        from pyspark import cloudpickle
      File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
        _cell_set_template_code = _make_cell_set_template_code()
      File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
        return types.CodeType(
    TypeError: an integer is required (got type bytes)
    
    

    It seems that Pyspark comes with pre-packaged version of cloudpickle package that had some issues on Python 3.8, which are now resolved (at least as of version 1.3.0) on pip version, however Pyspark version is still broken. Did anyone face the same issue/had any luck resolving this?

    • 10465355
      10465355 almost 4 years
      Spark doesn't support Python 3.8 until 3.0.0
    • blackbishop
      blackbishop almost 4 years
    • Dmitry Deryabin
      Dmitry Deryabin almost 4 years
      @blackbishop, No unfortunately it doesn't since downgrading is not an options for my use case.
    • blackbishop
      blackbishop almost 4 years
      @cricket_007 See this issue
    • OneCricketeer
      OneCricketeer almost 4 years
      @Dmitry Why not? Looks like you're creating your own env, so you're going to have to if you want to use pyspark
    • Dmitry Deryabin
      Dmitry Deryabin almost 4 years
      @cricket_007 Our library needs to support Python 3.8 and it also relies on Pyspark. Python 3.7 is already supported :) So it seems clear that for now 3.8 is not an option (at least until Spark 3.0 is released)
  • Megan
    Megan over 3 years
    Is there a way of downgrading to 3.7 this for aws emr clusters? Docs for this just seem to be pointing to python 3.4 -> 3.6 transitions...
  • Paul Watson
    Paul Watson over 3 years
    Can confirm this was my issue to, python 3.8 failing, python 3.7.8 working.
  • brajesh jaishwal
    brajesh jaishwal over 3 years
    doesn't work with python 3.8 needs 3.7 to be installed
  • Kubra Altun
    Kubra Altun over 2 years
    It did not work with python3.9, but worked with python3.7.