Python: "Pandas data cast to numpy dtype of object. Check input data with np.asarray(data)."

12,033

Solution 1

You need to make the "time_field" column as index of your data frame(in the ARIMA model, we should always set the date time column as index of the data frame)

frame=frame.set_index['time_field']
model = ARIMA(frame, order=(5,1,0))
model_fit = model.fit(disp=0)

Note:- When you're setting up the index column, you may get error if the index column has any duplicate values. So in that case, you better do a group by summation.

frame = frame.groupby(['time_field']).agg({'value_field': 'sum'}) 
or
frame = frame.groupby(['time_field']).sum()

Solution 2

I had a similar problem and worked for me using pandas Series instead of the DataFrame, with the timestamp column as index

data = pd.Series(frame.value_fields, index=frame.time_field)
model = ARIMA(data, order=(5,1,0))
model_fit = model.fit(disp=0)
Share:
12,033
Julian Almanzar
Author by

Julian Almanzar

Updated on July 11, 2022

Comments

  • Julian Almanzar
    Julian Almanzar almost 2 years

    I'm trying to create an ARIMA model for forecasting a time-serie with some data from my server, and i keep the error on the title showing up and i don't know what type of object i need. Here's the code:

    frame = pd.read_sql(query, con=connection)
    connection.close()
    frame['time_field'] = pd.to_timedelta(frame['time_field'])
    print(frame.head(10))
    #fitting
    model = ARIMA(frame, order=(5,1,0))
    model_fit = model.fit(disp=0)
    

    i've seen examples like this one: https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/

    where they use dates instead of times with the respectives values. This is the output of the frame value:

    time_field   value_field
    0 00:00:14  283.80
    1 00:01:14  271.97
    2 00:02:14  320.53
    3 00:03:14  346.78
    4 00:04:14  280.72
    5 00:05:14  277.41
    6 00:06:14  308.65
    7 00:07:14  321.27
    8 00:08:14  320.68
    9 00:09:14  332.32
    
    • hd1
      hd1 over 6 years
      Why are you connection to mysql? Pandas abstracts this away.
    • Julian Almanzar
      Julian Almanzar over 6 years
      I'm connecting to mysql because that's the only server i have available right now, and i'm formatting it's output as a frame becasue that's the input format for the ARIMA function
    • Rafael P. Miranda
      Rafael P. Miranda over 6 years
      Have you found the answer?
  • José
    José almost 4 years
    data = pd.Series(frame.value_fields.values, index=frame.time_field) worked for me