train_test_split with multiple features

11,079

If you look at sklearn.model_selection.train_test_split, you can see it takes an *arrays argument. To split the first three of your arguments, therefore, you could use

CS_tr, CS_te, EN_tr, EN_te, SN_tr, SN_te = train_test_split(CS, EN, SN)

(of course, you can pass more arrays than that).

Note that current versions of sklearn return sparse arrays when given sparse arrays.

Share:
11,079

Related videos on Youtube

Ekkasit Smithipanon
Author by

Ekkasit Smithipanon

Updated on June 08, 2022

Comments

  • Ekkasit Smithipanon
    Ekkasit Smithipanon almost 2 years

    I'm currently trying to train a data set with a decision tree classifier but I couldn't get the train_test_split to work.

    From the code below CS is the target output and EN SN JT FT PW YR LO LA are features input.

    All variables that went through OHL are in sparse matrix format whereas the other are in array taken straight from the dataframe.

    def OHL(x, column): #OneHotEncoder
        le = LabelEncoder()
        enc = OneHotEncoder()
        Labeled = le.fit_transform(x[column].astype(str))
        return enc.fit_transform(Labeled.reshape(-1,1))
    
    ###------------------------------------------------------------------------
    
    df = pd.read_csv('h1b_kaggle.csv')
    df = df.drop(['Unnamed: 0','WORKSITE'],1)
    
    ###------------------------------------------------------------------------
    
    CS = OHL(df, 'CASE_STATUS')
    EN = OHL(df, 'EMPLOYER_NAME')
    SN = OHL(df, 'SOC_NAME')
    JT = OHL(df, 'JOB_TITLE')
    FT = OHL(df, 'FULL_TIME_POSITION')
    PW = np.array(df['PREVAILING_WAGE'])
    YR = OHL(df, 'YEAR')
    LO = np.array(df['lon'])
    LA = np.array(df['lat'])
    
  • Ekkasit Smithipanon
    Ekkasit Smithipanon about 6 years
    But after i do this and i want to use tree.DecisionTreeClassifier dont I have to group this into one variable? the fit function takes only 1 feature and 1 target.