ValueError: x and y must be the same size
Solution 1
Print X_train shape. What do you see? I'd bet X_train
is 2d (matrix with a single column), while y_train
1d (vector). In turn you get different sizes.
I think using X_train[:,0]
for plotting (which is from where the error originates) should solve the problem
Solution 2
Slicing with [:, :-1]
will give you a 2-dimensional array (including all rows and all columns excluding the last column).
Slicing with [:, 1]
will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2]
or [:, 1].reshape(-1, 1)
or [:, 1][:, None]
instead of [:, 1]
. This will make x
and y
comparable.
An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0]
(instead of [:, :1]
) for selecting the first column and [:, 1]
for selecting the second column.
Solution 3
Try this:
x_train=np.arange(0,len(x_train),1)
It will make an evenly spaced array
and your error
will be gone permanently.
Related videos on Youtube
user3521180
Updated on July 09, 2022Comments
-
user3521180 almost 2 years
import numpy as np import pandas as pd import matplotlib.pyplot as pt data1 = pd.read_csv('stage1_labels.csv') X = data1.iloc[:, :-1].values y = data1.iloc[:, 1].values from sklearn.preprocessing import LabelEncoder, OneHotEncoder label_X = LabelEncoder() X[:,0] = label_X.fit_transform(X[:,0]) encoder = OneHotEncoder(categorical_features = [0]) X = encoder.fit_transform(X).toarray() from sklearn.cross_validation import train_test_split X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0) #fitting Simple Regression to training set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train) #predecting the test set results y_pred = regressor.predict(X_test) #Visualization of the training set results pt.scatter(X_train, y_train, color = 'red') pt.plot(X_train, regressor.predict(X_train), color = 'green') pt.title('salary vs yearExp (Training set)') pt.xlabel('years of experience') pt.ylabel('salary') pt.show()
I need a help understanding the error in while executing the above code. Below is the error:
"raise ValueError("x and y must be the same size")"
I have .csv file with 1398 rows and 2 column. I have taken 40% as y_test set, as it is visible in the above code.