How to decide the size of layers in Keras' Dense method?
Solution 1
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
Solution 2
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
- Iris Setosa
- Iris Versicolour
- Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
Related videos on Youtube
Danf
Updated on August 22, 2021Comments
-
Danf over 2 years
Below is the simple example of multi-class classification task with IRIS data.
import seaborn as sns import numpy as np from sklearn.cross_validation import train_test_split from keras.models import Sequential from keras.layers.core import Dense, Activation, Dropout from keras.regularizers import l2 from keras.utils import np_utils #np.random.seed(1335) # Prepare data iris = sns.load_dataset("iris") iris.head() X = iris.values[:, 0:4] y = iris.values[:, 4] # Make test and train set train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0) ################################ # Evaluate Keras Neural Network ################################ # Make ONE-HOT def one_hot_encode_object_array(arr): '''One hot encode a numpy array of objects (e.g. strings)''' uniques, ids = np.unique(arr, return_inverse=True) return np_utils.to_categorical(ids, len(uniques)) train_y_ohe = one_hot_encode_object_array(train_y) test_y_ohe = one_hot_encode_object_array(test_y) model = Sequential() model.add(Dense(16, input_shape=(4,), activation="tanh", W_regularizer=l2(0.001))) model.add(Dropout(0.5)) model.add(Dense(3, activation='sigmoid')) model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam') # Actual modelling # If you increase the epoch the accuracy will increase until it drop at # certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with # epoch 70 hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1) score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0) print("Test fraction correct (NN-Score) = {:.2f}".format(score)) print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers? For example based on code above we have:
model.add(Dense(16, input_shape=(4,), activation="tanh", W_regularizer=l2(0.001))) model.add(Dense(3, activation='sigmoid'))
Where first parameter of
Dense
is 16 and second is 3.- Why two layers uses two different values for Dense?
- How do we choose what's the best value for Dense?