ValueError: x and y must be the same size

97,998

Solution 1

Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

Solution 2

Slicing with [:, :-1] will give you a 2-dimensional array (including all rows and all columns excluding the last column).

Slicing with [:, 1] will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2] or [:, 1].reshape(-1, 1) or [:, 1][:, None] instead of [:, 1]. This will make x and y comparable.

An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0] (instead of [:, :1]) for selecting the first column and [:, 1] for selecting the second column.

Solution 3

Try this:

x_train=np.arange(0,len(x_train),1)

It will make an evenly spaced array and your error will be gone permanently.

97,998

user3521180

Updated on July 09, 2022

Comments

user3521180 almost 2 years

import numpy as np
import pandas as pd
import matplotlib.pyplot as pt

data1 = pd.read_csv('stage1_labels.csv')

X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1].values

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_X = LabelEncoder()
X[:,0] = label_X.fit_transform(X[:,0])
encoder = OneHotEncoder(categorical_features = [0])
X = encoder.fit_transform(X).toarray()

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)

#fitting Simple Regression to training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#predecting the test set results
y_pred = regressor.predict(X_test)

#Visualization of the training set results
pt.scatter(X_train, y_train, color = 'red')
pt.plot(X_train, regressor.predict(X_train), color = 'green')
pt.title('salary vs yearExp (Training set)')
pt.xlabel('years of experience')
pt.ylabel('salary')
pt.show()

I need a help understanding the error in while executing the above code. Below is the error: