"TypeError: an integer is required (got type bytes)" when importing pyspark on Python 3.8

apache-spark pyspark python-3.8

11,371

you must downgrade your python version from 3.8 to 3.7 because pyspark doesn't support this version of python.

11,371

Dmitry Deryabin

Updated on June 04, 2022

Comments

Dmitry Deryabin almost 2 years

Created a conda environment:

conda create -y -n py38 python=3.8
conda activate py38

Installed Spark from Pip:

pip install pyspark
# Successfully installed py4j-0.10.7 pyspark-2.4.5

Try to import pyspark:

python -c "import pyspark"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
    from pyspark.context import SparkContext
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
    from pyspark import accumulators
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
    from pyspark import cloudpickle
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)

It seems that Pyspark comes with pre-packaged version of cloudpickle package that had some issues on Python 3.8, which are now resolved (at least as of version 1.3.0) on pip version, however Pyspark version is still broken. Did anyone face the same issue/had any luck resolving this?

10465355 about 4 years

Spark doesn't support Python 3.8 until 3.0.0
blackbishop about 4 years

Does this answer your question? How to fix 'TypeError: an integer is required (got type bytes)' error when trying to run pyspark after installing spark 2.4.4
Dmitry Deryabin about 4 years

@blackbishop, No unfortunately it doesn't since downgrading is not an options for my use case.
blackbishop about 4 years

@cricket_007 See this issue
OneCricketeer about 4 years

@Dmitry Why not? Looks like you're creating your own env, so you're going to have to if you want to use pyspark
Dmitry Deryabin about 4 years

@cricket_007 Our library needs to support Python 3.8 and it also relies on Pyspark. Python 3.7 is already supported :) So it seems clear that for now 3.8 is not an option (at least until Spark 3.0 is released)

Megan over 3 years

Is there a way of downgrading to 3.7 this for aws emr clusters? Docs for this just seem to be pointing to python 3.4 -> 3.6 transitions...
Paul Watson over 3 years

Can confirm this was my issue to, python 3.8 failing, python 3.7.8 working.
brajesh jaishwal over 3 years

doesn't work with python 3.8 needs 3.7 to be installed
Kubra Altun over 2 years

It did not work with python3.9, but worked with python3.7.