AttributeError: module 'pandas' has no attribute 'to_csv'

34,879

to_csv is a method of a DataFrame object, not of the pandas module.

df = pd.DataFrame(CV_data.take(5), columns=CV_data.columns)

# whatever manipulations on df

df.to_csv(...)

You also have a line pd.DataFrame(CV_data.take(5), columns=CV_data.columns) in your code.

This line creates a dataframe and then discards it. Even if you were successfully calling to_csv, none of your changes to CV_data would have been reflected in that dataframe (and therefore in the outputed csv file).

Share:
34,879
Inam
Author by

Inam

Updated on July 26, 2020

Comments

  • Inam
    Inam almost 4 years

    I took some rows from csv file like this

    pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 
    

    and performed some functions on it. now i want to save it in csv again but it is giving error module 'pandas' has no attribute 'to_csv' I am trying to save it like this

    pd.to_csv(CV_data, sep='\t', encoding='utf-8') 
    

    here is my full code. how can i save my resulting data in csv or excel?

       # Disable warnings, set Matplotlib inline plotting and load Pandas package
    import warnings
    warnings.filterwarnings('ignore')
    
    %matplotlib inline
    import pandas as pd
    pd.options.display.mpl_style = 'default' 
    
    CV_data = sqlContext.read.load('Downloads/data/churn-bigml-80.csv', 
                              format='com.databricks.spark.csv', 
                              header='true', 
                              inferSchema='true')
    
    final_test_data = sqlContext.read.load('Downloads/data/churn-bigml-20.csv', 
                              format='com.databricks.spark.csv', 
                              header='true', 
                              inferSchema='true')
    CV_data.cache()
    CV_data.printSchema() 
    
    pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 
    
    from pyspark.sql.types import DoubleType
    from pyspark.sql.functions import UserDefinedFunction
    
    binary_map = {'Yes':1.0, 'No':0.0, True:1.0, False:0.0} 
    toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())
    
    CV_data = CV_data.drop('State').drop('Area code') \
        .drop('Total day charge').drop('Total eve charge') \
        .drop('Total night charge').drop('Total intl charge') \
        .withColumn('Churn', toNum(CV_data['Churn'])) \
        .withColumn('International plan', toNum(CV_data['International plan'])) \
        .withColumn('Voice mail plan', toNum(CV_data['Voice mail plan'])).cache()
    
    final_test_data = final_test_data.drop('State').drop('Area code') \
        .drop('Total day charge').drop('Total eve charge') \
        .drop('Total night charge').drop('Total intl charge') \
        .withColumn('Churn', toNum(final_test_data['Churn'])) \
        .withColumn('International plan', toNum(final_test_data['International plan'])) \
        .withColumn('Voice mail plan', toNum(final_test_data['Voice mail plan'])).cache()
    
    pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 
    
    from pyspark.mllib.regression import LabeledPoint
    from pyspark.mllib.tree import DecisionTree
    
    def labelData(data):
        # label: row[end], features: row[0:end-1]
        return data.map(lambda row: LabeledPoint(row[-1], row[:-1]))
    
    training_data, testing_data = labelData(CV_data).randomSplit([0.8, 0.2])
    
    model = DecisionTree.trainClassifier(training_data, numClasses=2, maxDepth=2,
                                         categoricalFeaturesInfo={1:2, 2:2},
                                         impurity='gini', maxBins=32)
    
    print (model.toDebugString())  
    print ('Feature 12:', CV_data.columns[12])
    print ('Feature 4: ', CV_data.columns[4] ) 
    
    from pyspark.mllib.evaluation import MulticlassMetrics
    
    def getPredictionsLabels(model, test_data):
        predictions = model.predict(test_data.map(lambda r: r.features))
        return predictions.zip(test_data.map(lambda r: r.label))
    
    def printMetrics(predictions_and_labels):
        metrics = MulticlassMetrics(predictions_and_labels)
        print ('Precision of True ', metrics.precision(1))
        print ('Precision of False', metrics.precision(0))
        print ('Recall of True    ', metrics.recall(1))
        print ('Recall of False   ', metrics.recall(0))
        print ('F-1 Score         ', metrics.fMeasure())
        print ('Confusion Matrix\n', metrics.confusionMatrix().toArray()) 
    
    predictions_and_labels = getPredictionsLabels(model, testing_data)
    
    printMetrics(predictions_and_labels)  
    
    CV_data.groupby('Churn').count().toPandas() 
    
    stratified_CV_data = CV_data.sampleBy('Churn', fractions={0: 388./2278, 1: 1.0}).cache()
    
    stratified_CV_data.groupby('Churn').count().toPandas() 
    
    pd.to_csv(CV_data, sep='\t', encoding='utf-8') 
    
  • wovano
    wovano almost 4 years
    Thanks for contributing to Stack Overflow. However, this answer does not seem to add anything new. Everything you mentioned is already explained in the accepted answer that was posted more than 4 years ago and has 11 upvotes at this time. When answering old questions, make sure to add something new. Also, you might want to improve the formatting of your posts. See the Markdown Editing Help if you don't know how to do that.