AttributeError: module 'pandas' has no attribute 'to_csv'
34,879
to_csv
is a method of a DataFrame
object, not of the pandas
module.
df = pd.DataFrame(CV_data.take(5), columns=CV_data.columns)
# whatever manipulations on df
df.to_csv(...)
You also have a line pd.DataFrame(CV_data.take(5), columns=CV_data.columns)
in your code.
This line creates a dataframe and then discards it. Even if you were successfully calling to_csv
, none of your changes to CV_data
would have been reflected in that dataframe (and therefore in the outputed csv file).
Author by
Inam
Updated on July 26, 2020Comments
-
Inam almost 4 years
I took some rows from csv file like this
pd.DataFrame(CV_data.take(5), columns=CV_data.columns)
and performed some functions on it. now i want to save it in csv again but it is giving error
module 'pandas' has no attribute 'to_csv'
I am trying to save it like thispd.to_csv(CV_data, sep='\t', encoding='utf-8')
here is my full code. how can i save my resulting data in csv or excel?
# Disable warnings, set Matplotlib inline plotting and load Pandas package import warnings warnings.filterwarnings('ignore') %matplotlib inline import pandas as pd pd.options.display.mpl_style = 'default' CV_data = sqlContext.read.load('Downloads/data/churn-bigml-80.csv', format='com.databricks.spark.csv', header='true', inferSchema='true') final_test_data = sqlContext.read.load('Downloads/data/churn-bigml-20.csv', format='com.databricks.spark.csv', header='true', inferSchema='true') CV_data.cache() CV_data.printSchema() pd.DataFrame(CV_data.take(5), columns=CV_data.columns) from pyspark.sql.types import DoubleType from pyspark.sql.functions import UserDefinedFunction binary_map = {'Yes':1.0, 'No':0.0, True:1.0, False:0.0} toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType()) CV_data = CV_data.drop('State').drop('Area code') \ .drop('Total day charge').drop('Total eve charge') \ .drop('Total night charge').drop('Total intl charge') \ .withColumn('Churn', toNum(CV_data['Churn'])) \ .withColumn('International plan', toNum(CV_data['International plan'])) \ .withColumn('Voice mail plan', toNum(CV_data['Voice mail plan'])).cache() final_test_data = final_test_data.drop('State').drop('Area code') \ .drop('Total day charge').drop('Total eve charge') \ .drop('Total night charge').drop('Total intl charge') \ .withColumn('Churn', toNum(final_test_data['Churn'])) \ .withColumn('International plan', toNum(final_test_data['International plan'])) \ .withColumn('Voice mail plan', toNum(final_test_data['Voice mail plan'])).cache() pd.DataFrame(CV_data.take(5), columns=CV_data.columns) from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.tree import DecisionTree def labelData(data): # label: row[end], features: row[0:end-1] return data.map(lambda row: LabeledPoint(row[-1], row[:-1])) training_data, testing_data = labelData(CV_data).randomSplit([0.8, 0.2]) model = DecisionTree.trainClassifier(training_data, numClasses=2, maxDepth=2, categoricalFeaturesInfo={1:2, 2:2}, impurity='gini', maxBins=32) print (model.toDebugString()) print ('Feature 12:', CV_data.columns[12]) print ('Feature 4: ', CV_data.columns[4] ) from pyspark.mllib.evaluation import MulticlassMetrics def getPredictionsLabels(model, test_data): predictions = model.predict(test_data.map(lambda r: r.features)) return predictions.zip(test_data.map(lambda r: r.label)) def printMetrics(predictions_and_labels): metrics = MulticlassMetrics(predictions_and_labels) print ('Precision of True ', metrics.precision(1)) print ('Precision of False', metrics.precision(0)) print ('Recall of True ', metrics.recall(1)) print ('Recall of False ', metrics.recall(0)) print ('F-1 Score ', metrics.fMeasure()) print ('Confusion Matrix\n', metrics.confusionMatrix().toArray()) predictions_and_labels = getPredictionsLabels(model, testing_data) printMetrics(predictions_and_labels) CV_data.groupby('Churn').count().toPandas() stratified_CV_data = CV_data.sampleBy('Churn', fractions={0: 388./2278, 1: 1.0}).cache() stratified_CV_data.groupby('Churn').count().toPandas() pd.to_csv(CV_data, sep='\t', encoding='utf-8')
-
wovano almost 4 yearsThanks for contributing to Stack Overflow. However, this answer does not seem to add anything new. Everything you mentioned is already explained in the accepted answer that was posted more than 4 years ago and has 11 upvotes at this time. When answering old questions, make sure to add something new. Also, you might want to improve the formatting of your posts. See the Markdown Editing Help if you don't know how to do that.