How to decide the size of layers in Keras' Dense method?

20,965

Solution 1

Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).

Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.

There are two basic methods:

  • Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.

  • Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.

Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.

Solution 2

There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.

Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.

Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.

You could do some proportion on your number of units value, something like:

# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,),  activation = 'relu' )) 
model.add(Dropout(0.5))

model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))

model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))

#Output layer
model.add(Dense(num_classes, activation = 'softmax'))

The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:

  • Iris Setosa
  • Iris Versicolour
  • Iris Virginica

num_classes = 3

However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.

Share:
20,965

Related videos on Youtube

Danf
Author by

Danf

Updated on August 22, 2021

Comments

  • Danf
    Danf over 2 years

    Below is the simple example of multi-class classification task with IRIS data.

    import seaborn as sns
    import numpy as np
    from sklearn.cross_validation import train_test_split
    from keras.models import Sequential
    from keras.layers.core import Dense, Activation, Dropout
    from keras.regularizers import l2
    from keras.utils import np_utils
    
    
    #np.random.seed(1335)
    
    # Prepare data
    iris = sns.load_dataset("iris")
    iris.head()
    X = iris.values[:, 0:4]
    y = iris.values[:, 4]
    
    
    # Make test and train set
    train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
    
    
    ################################
    # Evaluate Keras Neural Network
    ################################
    
    
    # Make ONE-HOT
    def one_hot_encode_object_array(arr):
        '''One hot encode a numpy array of objects (e.g. strings)'''
        uniques, ids = np.unique(arr, return_inverse=True)
        return np_utils.to_categorical(ids, len(uniques))
    
    train_y_ohe = one_hot_encode_object_array(train_y)
    test_y_ohe = one_hot_encode_object_array(test_y)
    
    
    model = Sequential()
    model.add(Dense(16, input_shape=(4,),
          activation="tanh",
          W_regularizer=l2(0.001)))
    model.add(Dropout(0.5))
    model.add(Dense(3, activation='sigmoid'))
    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
    
    
    # Actual modelling
    # If you increase the epoch the accuracy will increase until it drop at
    # certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
    # epoch 70 
    hist = model.fit(train_X, train_y_ohe, verbose=0,   nb_epoch=100,  batch_size=1)
    
    
    score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
    print("Test fraction correct (NN-Score) = {:.2f}".format(score))
    print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
    

    My question is how do people usually decide the size of layers? For example based on code above we have:

    model.add(Dense(16, input_shape=(4,),
          activation="tanh",
          W_regularizer=l2(0.001)))
    model.add(Dense(3, activation='sigmoid'))
    

    Where first parameter of Dense is 16 and second is 3.

    • Why two layers uses two different values for Dense?
    • How do we choose what's the best value for Dense?