dump and load a dill (pickle) in two different files

python pandas pickle dill

18,827

Solution 1

Thank you. It looks like the following can solve the problem.

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(means, file)

with open('name_model.pkl', 'rb') as file:
    B = pickle.load(file)

def name_score_function(record):
    if record in names:
        return(means.loc[record, 'score'])

print(name_score_function("John"))

Solution 2

Hmm. you need to read it the same way you wrote it -- nesting it inside an open clause:

import dill as pickle
with open('name_model.pkl' ,'rb') as f:
    B = pickle.load(f)

18,827

Author by

yearntolearn

Updated on July 27, 2022

Comments

yearntolearn almost 2 years

I think this is fundamental to many people who know how to deal with pickle. However, I still can't get it very right after trying for a few hours. I have the following code:

In the first file

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

def name_score_function(record):
    if record in names:
        return(means.loc[record, 'score'])

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(means, file)

The second file

I would like to load what I have in the first file and make the score of a person (i.e. John, Mary, Suzanne) callable via a function name_model(record):

import dill as pickle
B = pickle.load('name_model.pkl')

def name_model(record):
    if record in names:
        return(means.loc[record, 'score'])

Here it shows the error:

File "names.py", line 21, in <module>
B = pickle.load('name_model.pkl')
File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 197, in load
pik = Unpickler(file)
File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 356, in __init__
StockUnpickler.__init__(self, *args, **kwds)
File "/opt/conda/lib/python2.7/pickle.py", line 847, in __init__
self.readline = file.readline
AttributeError: 'str' object has no attribute 'readline'

I know the error comes from my lack of understanding of pickle. I would humbly accept your opinions to improve this code. Thank you!!

UPDATE The more specific thing I would like to achieve:

I would like to be able to use the function that I write in the first file and dump it, and then read it in the second file and be able to use this function to query the mean score of any person in the records.

Here is what I have:

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

def name_score_function(record):
if record in names:
    return(means.loc[record, 'score'])

B = name_score_function(record)

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(B, file)

with open('name_model.pkl', 'rb') as file:
    B = pickle.load(f)

def name_model(record):
   return B(record)

print(name_model("John"))

As I execute this code, I have this error File "test.py", line 13, in <module> B = name_score_function(record) NameError: name 'record' is not defined

I highly appreciate your assistance and patience.

yearntolearn almost 8 years

Ah, I see. Thank you!!
yearntolearn almost 8 years

Here in the second file, however, as I run it, it shows that NameError: global name 'city' is not defined Does this mean I should also picke the dataframe?
eafit almost 8 years

@Hsun-YiHsieh what city?
yearntolearn almost 8 years

Hi, @eafit, I am sorry. It's my mistake. I would like to be able to query any possible name which occurs in means. Specifically, I hope to write a function in the second file with structure def name_model(record): return