how to Load CSV Data in scikit and using it for Naive Bayes Classification

12,171

The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np

# create data frame containing your data, each column can be accessed # by df['column   name']
df = pd.read_csv('/your/path/yourFile.csv')

target_names = np.array(['Positives','Negatives'])

# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets

# define training and test sets
train = df[df['is_train']==True]
test = df[df['is_train']==False]

trainTargets = np.array(train['Targets']).astype(int)
testTargets = np.array(test['Targets']).astype(int)

# columns you want to model
features = df.columns[0:7]

# call Gaussian Naive Bayesian class with default parameters
gnb = GaussianNB()

# train model
y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])
Share:
12,171

Related videos on Youtube

satish john
Author by

satish john

Updated on June 04, 2022

Comments

  • satish john
    satish john almost 2 years

    Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for target.

    Use the same data for Train and Test or use a complete set just for test.

    Sl No,Member ID,Member Name,Location,DOB,Gender,Marital Status,Children,Ethnicity,Insurance Plan ID,Annual Income ($),Twitter User ID
    1,70000001,Fly Dorami,New York,39786,M,Single,,Asian,2002,0,548900028
    2,70000002,Bennie Ariana,Pennsylvania,6/24/1940,F,Single,,Caucasian,2002,66313,
    3,70000003,Brad Farley,Pennsylvania,12001,F,Married,4,African American,2002,98444,
    4,70000004,Daggoo Cece,Indiana,14032,F,Married,2,Hispanic,2001,41896,113481472.
    
  • satish john
    satish john over 10 years
    Thanks for the solution, how to feed the target example "Marital Status". Since when I run the program i get error targets undefined df['Type'] = pd.Factor(targets, target_names) line ..
  • rlmlr
    rlmlr over 10 years
    You have to define the array, targets, it should be a single colunm containing 0's and 1's if your doing binary classification before you call df['Type'] = pd.Factor(targets, target_names). Can you give a little more information on your classification problem.
  • math_law
    math_law about 7 years
    Above code is explanatory but missing the variable "targets". Could you add ?