How to get data from pickle files into a pandas dataframe

18,513

You can use

  1. pd.read_pickle(filename)
  2. add it to a list
  3. then pd.concat(thelist)
Share:
18,513
Andrew Smith
Author by

Andrew Smith

Updated on June 21, 2022

Comments

  • Andrew Smith
    Andrew Smith almost 2 years

    I'm working on a social media sentiment analysis for a class. I have gotten all of the tweets about the Kentucky Derby for a 2 month period saved into pkl files.

    My question is: how do I get all of these pickle dump files loaded into a dataframe?

    Here is my code:

    import sklearn as sk
    import pandas as pd
    import  got3
    
    def daterange(start_date, end_date):
    for n in range(int ((end_date - start_date).days)):
        yield start_date + timedelta(n)
    
    start_date = date(2016, 3, 31)
    end_date = date(2016, 6, 1)
    
    dates = []
    
    for single_date in daterange(start_date, end_date):
        dates.append(single_date.strftime("%Y-%m-%d"))
    
    for i in range(len(dates)-1): 
        this_date = dates[i]
        tomorrow_date = dates[i+1]
        print("Getting tweets for " + tomorrow_date)
        tweetCriteria = got3.manager.TweetCriteria()
        tweetCriteria.setQuerySearch("Kentucky Derby")
        tweetCriteria.setQuerySearch("KYDerby")
        tweetCriteria.setSince(this_date)
        tweetCriteria.setUntil(tomorrow_date)
        Kentucky_Derby_tweets = got3.manager.TweetManager.getTweets(tweetCriteria)
        pkl.dump(Kentucky_Derby_tweets, open(tomorrow_date + ".pkl", "wb"))