JavaPackage object is not callable error: Pyspark

11,616

In zeppelin interpreter code

java_import(gateway.jvm, "org.apache.spark.sql.*")

was not getting executed. Adding this to the import fixed the issue

Share:
11,616
Himaprasoon
Author by

Himaprasoon

Python Dev, Deep Learning Engineer

Updated on July 11, 2022

Comments

  • Himaprasoon
    Himaprasoon almost 2 years

    Operations like dataframe.show() , sQLContext.read.json works fine , but most functions gives "JavaPackage object is not callable error" . eg : when i do

    dataFrame.withColumn(field_name, monotonically_increasing_id())
    

    I get an error

    File "/tmp/spark-cd423f35-9572-45ee-b159-1b2732afa2a6/userFiles-3a6e1729-95f4-468b-914c-c706369bf2a6/Transformations.py", line 64, in add_id_column
        self.dataFrame = self.dataFrame.withColumn(field_name, monotonically_increasing_id())
      File "/home/himaprasoon/apps/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/functions.py", line 347, in monotonically_increasing_id
        return Column(sc._jvm.functions.monotonically_increasing_id())
    TypeError: 'JavaPackage' object is not callable
    

    I am using apache-zeppelin interpreter and have added py4j to python path.

    When I do

    import py4j
    print(dir(py4j))
    

    the import succeeds

    ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'compat', 'finalizer', 'java_collections', 'java_gateway', 'protocol', 'version']
    

    When I tried

    print(sc._jvm.functions)
    

    in pyspark shell it prints

    <py4j.java_gateway.JavaClass object at 0x7fdaf9727ba8>
    

    But when I try this in my interpreter it prints

    <py4j.java_gateway.JavaPackage object at 0x7f07cc3f77f0>