dataframe object is not callable in pyspark
16,451
Solution 1
You are missing a bracket, but it also seems some of the syntax is wrong. I assume this is what your code was before the bracket got missing:
fn.avgspark_df("average")
Which is why you get the error; you are trying to call the DataFrame as a function. I believe you can achieve what you want with:
import pyspark.sql.functions as fn
from pyspark.sql import Window
df = pd.DataFrame({'id': [0,0,0,0,0,1,1,1,1,1],
'time': [1,2,3,4,5,1,2,3,4,5],
'average':[0,1,2,3,4,5,6,7,8,9] })
df = sqlContext.createDataFrame(df)
temp = Window.partitionBy("id").orderBy("time").rowsBetween(-1, 1)
df.withColumn("movingAvg",fn.avg("average").over(temp)).show()
Solution 2
from pyspark.sql import SparkSession
from pyspark.sql import Window
from pyspark.sql.functions import max,min,avg
spark = SparkSession.builder.appName("Data Frame Example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
l=[("Alice", "2016-05-01", 50.00),
("Alice", "2016-05-03", 45.00),
("Alice", "2016-05-04", 55.00),
("Bob", "2016-05-01", 25.00),
("Bob", "2016-05-04", 29.00),
("Bob", "2016-05-06", 27.00)]
customers = spark.sparkContext.parallelize(l).toDF(["name", "date", "amountSpent"])
temp = Window.partitionBy("name").orderBy("date")
customers.withColumn( "movingAvg",avg("amountSpent").over(temp)).show()
Related videos on Youtube
Author by
xinlin li
Updated on June 04, 2022Comments
-
xinlin li over 1 year
temp = Window.partitionBy("id").orderBy("time").rowsBetween(-5, 5) spark_df.withColumn("movingAvg",fn.avgspark_df("average")).over(temp)).show()
I'm getting this error in the last line .
dataframe object is not callable
-
fgamess over 5 yearsplease read stackoverflow.com/help/how-to-ask
-
vvg over 5 yearssomething wrong with your parenthesis - too many closed
-
Florian over 5 years@Rumoku or in this case, I think there are not enough opened ;)
-
-
xinlin li over 5 yearsThanks for my help! I can run your code successfully,but I can't do it in my code.Maybe because I use :spark= SparkSession.builder.appName(" ").getOrCreate() file = "D:\project\HistoryData.csv" lines = pd.read_csv(file) cc=lines.values.tolist() spark_df=spark.createDataFrame(cc,['time','average','max','min']) this to create my dataframe
-
xinlin li over 5 yearsIt tell me pyspark.sql.utils.AnalysisException.
-
Florian over 5 years@xinlinli Does that happen before or after the calculation of moving average?See e.g. here stackoverflow.com/questions/41785342/… or stackoverflow.com/questions/42091575/… or stackoverflow.com/questions/43100458/… or stackoverflow.com/questions/39016440/…
-
xinlin li over 5 yearsNow I succeeded in solving the problem,but I meet a new strange problem stackoverflow.com/questions/51316635/…
-
Florian over 5 years@xinlinli Please consider accepting this answer if it helped you solve your issue. I will take a look at your new question.
-
xinlin li over 5 yearsOK,I will,Thank you!