ValueError: x and y must be the same size

97,998

Solution 1

Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

Solution 2

Slicing with [:, :-1] will give you a 2-dimensional array (including all rows and all columns excluding the last column).

Slicing with [:, 1] will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2] or [:, 1].reshape(-1, 1) or [:, 1][:, None] instead of [:, 1]. This will make x and y comparable.


An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0] (instead of [:, :1]) for selecting the first column and [:, 1] for selecting the second column.

Solution 3

Try this:

x_train=np.arange(0,len(x_train),1)

It will make an evenly spaced array and your error will be gone permanently.

Share:
97,998

Related videos on Youtube

user3521180
Author by

user3521180

Updated on July 09, 2022

Comments

  • user3521180
    user3521180 almost 2 years
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as pt
    
    data1 = pd.read_csv('stage1_labels.csv')
    
    X = data1.iloc[:, :-1].values
    y = data1.iloc[:, 1].values
    
    from sklearn.preprocessing import LabelEncoder, OneHotEncoder
    label_X = LabelEncoder()
    X[:,0] = label_X.fit_transform(X[:,0])
    encoder = OneHotEncoder(categorical_features = [0])
    X = encoder.fit_transform(X).toarray()
    
    from sklearn.cross_validation import train_test_split
    X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)
    
    #fitting Simple Regression to training set
    
    from sklearn.linear_model import LinearRegression
    regressor = LinearRegression()
    regressor.fit(X_train, y_train)
    
    #predecting the test set results
    y_pred = regressor.predict(X_test)
    
    #Visualization of the training set results
    pt.scatter(X_train, y_train, color = 'red')
    pt.plot(X_train, regressor.predict(X_train), color = 'green')
    pt.title('salary vs yearExp (Training set)')
    pt.xlabel('years of experience')
    pt.ylabel('salary')
    pt.show()
    

    I need a help understanding the error in while executing the above code. Below is the error:

    "raise ValueError("x and y must be the same size")"

    I have .csv file with 1398 rows and 2 column. I have taken 40% as y_test set, as it is visible in the above code.

Related