dump and load a dill (pickle) in two different files

18,827

Solution 1

Thank you. It looks like the following can solve the problem.

import pandas as pd

names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]

records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()

import dill as pickle
with open('name_model.pkl', 'wb') as file:
    pickle.dump(means, file)

with open('name_model.pkl', 'rb') as file:
    B = pickle.load(file)

def name_score_function(record):
    if record in names:
        return(means.loc[record, 'score'])

print(name_score_function("John"))

Solution 2

Hmm. you need to read it the same way you wrote it -- nesting it inside an open clause:

import dill as pickle
with open('name_model.pkl' ,'rb') as f:
    B = pickle.load(f)
Share:
18,827
yearntolearn
Author by

yearntolearn

Updated on July 27, 2022

Comments

  • yearntolearn
    yearntolearn almost 2 years

    I think this is fundamental to many people who know how to deal with pickle. However, I still can't get it very right after trying for a few hours. I have the following code:

    In the first file

    import pandas as pd
    
    names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
    scores = [80, 90, 90, 92, 95, 100]
    
    records = pd.DataFrame({"name": names, "score": scores})
    means = records.groupby('name').mean()
    
    def name_score_function(record):
        if record in names:
            return(means.loc[record, 'score'])
    
    import dill as pickle
    with open('name_model.pkl', 'wb') as file:
        pickle.dump(means, file)
    

    The second file

    I would like to load what I have in the first file and make the score of a person (i.e. John, Mary, Suzanne) callable via a function name_model(record):

    import dill as pickle
    B = pickle.load('name_model.pkl')
    
    def name_model(record):
        if record in names:
            return(means.loc[record, 'score'])
    

    Here it shows the error:

    File "names.py", line 21, in <module>
    B = pickle.load('name_model.pkl')
    File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 197, in load
    pik = Unpickler(file)
    File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 356, in __init__
    StockUnpickler.__init__(self, *args, **kwds)
    File "/opt/conda/lib/python2.7/pickle.py", line 847, in __init__
    self.readline = file.readline
    AttributeError: 'str' object has no attribute 'readline'
    

    I know the error comes from my lack of understanding of pickle. I would humbly accept your opinions to improve this code. Thank you!!

    UPDATE The more specific thing I would like to achieve:

    I would like to be able to use the function that I write in the first file and dump it, and then read it in the second file and be able to use this function to query the mean score of any person in the records.

    Here is what I have:

    import pandas as pd
    
    names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
    scores = [80, 90, 90, 92, 95, 100]
    
    records = pd.DataFrame({"name": names, "score": scores})
    means = records.groupby('name').mean()
    
    def name_score_function(record):
    if record in names:
        return(means.loc[record, 'score'])
    
    B = name_score_function(record)
    
    import dill as pickle
    with open('name_model.pkl', 'wb') as file:
        pickle.dump(B, file)
    
    with open('name_model.pkl', 'rb') as file:
        B = pickle.load(f)
    
    def name_model(record):
       return B(record)
    
    print(name_model("John"))  
    

    As I execute this code, I have this error File "test.py", line 13, in <module> B = name_score_function(record) NameError: name 'record' is not defined

    I highly appreciate your assistance and patience.

  • yearntolearn
    yearntolearn almost 8 years
    Ah, I see. Thank you!!
  • yearntolearn
    yearntolearn almost 8 years
    Here in the second file, however, as I run it, it shows that NameError: global name 'city' is not defined Does this mean I should also picke the dataframe?
  • eafit
    eafit almost 8 years
    @Hsun-YiHsieh what city?
  • yearntolearn
    yearntolearn almost 8 years
    Hi, @eafit, I am sorry. It's my mistake. I would like to be able to query any possible name which occurs in means. Specifically, I hope to write a function in the second file with structure def name_model(record): return