How to decide the size of layers in Keras' Dense method?

python machine-learning scikit-learn deep-learning keras

20,965

Solution 1

Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).

Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.

There are two basic methods:

Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.

Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.

Solution 2

There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.

Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.

Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.

You could do some proportion on your number of units value, something like:

# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,),  activation = 'relu' )) 
model.add(Dropout(0.5))

model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))

model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))

#Output layer
model.add(Dense(num_classes, activation = 'softmax'))

The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:

Iris Setosa
Iris Versicolour
Iris Virginica

num_classes = 3

However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.

20,965

Danf

Updated on August 22, 2021

Comments

Danf over 2 years

Below is the simple example of multi-class classification task with IRIS data.

import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils


#np.random.seed(1335)

# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]


# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)


################################
# Evaluate Keras Neural Network
################################


# Make ONE-HOT
def one_hot_encode_object_array(arr):
    '''One hot encode a numpy array of objects (e.g. strings)'''
    uniques, ids = np.unique(arr, return_inverse=True)
    return np_utils.to_categorical(ids, len(uniques))

train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)


model = Sequential()
model.add(Dense(16, input_shape=(4,),
      activation="tanh",
      W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')


# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70 
hist = model.fit(train_X, train_y_ohe, verbose=0,   nb_epoch=100,  batch_size=1)


score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))

My question is how do people usually decide the size of layers? For example based on code above we have:

model.add(Dense(16, input_shape=(4,),
      activation="tanh",
      W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))

Where first parameter of Dense is 16 and second is 3.

Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

How to standard scale a 3D matrix?

Keras: Dice coefficient loss function is negative and increasing with epochs

Keras: find out the number of layers

Keras, output of model predict_proba

Unable to import Tokenizer from Keras

Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 784

AttributeError: 'Sequential' object has no attribute 'output_names'

How to extract False Positive, False Negative from a confusion matrix of multiclass classification

How to load a keras model saved as .pb

How to use Keras' multi layer perceptron for multi-class classification