train_test_split with multiple features
11,079
If you look at sklearn.model_selection.train_test_split
, you can see it takes an *arrays
argument. To split the first three of your arguments, therefore, you could use
CS_tr, CS_te, EN_tr, EN_te, SN_tr, SN_te = train_test_split(CS, EN, SN)
(of course, you can pass more arrays than that).
Note that current versions of sklearn
return sparse arrays when given sparse arrays.
Related videos on Youtube
Author by
Ekkasit Smithipanon
Updated on June 08, 2022Comments
-
Ekkasit Smithipanon almost 2 years
I'm currently trying to train a data set with a decision tree classifier but I couldn't get the train_test_split to work.
From the code below CS is the target output and EN SN JT FT PW YR LO LA are features input.
All variables that went through OHL are in sparse matrix format whereas the other are in array taken straight from the dataframe.
def OHL(x, column): #OneHotEncoder le = LabelEncoder() enc = OneHotEncoder() Labeled = le.fit_transform(x[column].astype(str)) return enc.fit_transform(Labeled.reshape(-1,1)) ###------------------------------------------------------------------------ df = pd.read_csv('h1b_kaggle.csv') df = df.drop(['Unnamed: 0','WORKSITE'],1) ###------------------------------------------------------------------------ CS = OHL(df, 'CASE_STATUS') EN = OHL(df, 'EMPLOYER_NAME') SN = OHL(df, 'SOC_NAME') JT = OHL(df, 'JOB_TITLE') FT = OHL(df, 'FULL_TIME_POSITION') PW = np.array(df['PREVAILING_WAGE']) YR = OHL(df, 'YEAR') LO = np.array(df['lon']) LA = np.array(df['lat'])
-
Ekkasit Smithipanon about 6 yearsBut after i do this and i want to use tree.DecisionTreeClassifier dont I have to group this into one variable? the fit function takes only 1 feature and 1 target.