Pre-processing before digit recognition for NN & CNN trained with MNIST dataset

11,353

So what you are looking for is a generalised way of normalising you test data so that it can be compared against the MNIST training data. Perhaps you could first use a technique to normalise the MNIST training data into a standard format, then train your CNN, then normalise you test data using the same process, then apply the CNN for recognition.

Have you seen this paper? It uses moment based image normalisation. It is word level, so not quite what you are doing, but should be easy enough to implement.

Moment-based Image Normalization for Handwritten Text Recognition (Kozielski et al.):

Share:
11,353
yasin.yazici
Author by

yasin.yazici

Updated on June 14, 2022

Comments

  • yasin.yazici
    yasin.yazici about 2 years

    I'm trying to classify handwriting digits, written by myself and a few friends, by usign NN and CNN. In order to train the NN, MNIST dataset is used. The problem is the NN trained with MNIST dataset does not give satisfying test results on my dataset. I've used some libraries on Python and MATLAB with different settings as listed below.

    On Python I've used this code with setting;

    • 3-layers NN with # of inputs = 784, # of hidden neurons = 30, # of outputs = 10
    • Cost function = cross entropy
    • Number of Epochs = 30
    • Batch size = 10
    • Learning rate = 0.5

    it is trained with MNIST training set, and test results are as follows:

    test result on MNIST = 96% test result on my own dataset = 80%

    On MATLAB I've used deep learning toolbox with various setting, normalization included, similar to above and best accuracy of NN is around 75%.Both NN and CNN are used on MATLAB.

    I've tried to resemble my own dataset to MNIST. The results above collected from pre-processed dataset. Here is the pre-processes applied to my dataset:

    • Each digit is cropped separately and resized to 28 x 28 by usign bicubic interpolation
    • Pathces are centered with the mean values in MNIST by usign bounding box on MATLAB
    • Background is 0 and highest pixel value is 1 as in MNIST

    I couldn't know what to do more. There are still some differences like contrast etc., but contrast enhancement trials couldn't increase the accuracy.

    Here is some digits from MNIST and my own dataset to compare them visually.

    MNIST digits

    my own dataset

    As you may see, there is a clear contrast difference. I think the accuracy problem is because of the lack of similarity between MNIST and my own dataset. How can I handle this issue?

    There is a similar question in here, but his dataset is collection of printed digits, not like mine.

    Edit: I've also tested binarized verison of my own dataset on NN trained with binarized MNIST and default MNIST. Binarization threshold is 0.05.

    Here is an example image in matrix form from MNIST dataset and my own dataset, respectively. Both of them are 5.

    MNIST:

      Columns 1 through 10
    
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0    0.1176    0.1412
             0         0         0         0         0         0         0    0.1922    0.9333    0.9922
             0         0         0         0         0         0         0    0.0706    0.8588    0.9922
             0         0         0         0         0         0         0         0    0.3137    0.6118
             0         0         0         0         0         0         0         0         0    0.0549
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0    0.0902    0.2588
             0         0         0         0         0         0    0.0706    0.6706    0.8588    0.9922
             0         0         0         0    0.2157    0.6745    0.8863    0.9922    0.9922    0.9922
             0         0         0         0    0.5333    0.9922    0.9922    0.9922    0.8314    0.5294
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
    
      Columns 11 through 20
    
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0    0.0118    0.0706    0.0706    0.0706    0.4941    0.5333    0.6863    0.1020
        0.3686    0.6039    0.6667    0.9922    0.9922    0.9922    0.9922    0.9922    0.8824    0.6745
        0.9922    0.9922    0.9922    0.9922    0.9922    0.9922    0.9922    0.9843    0.3647    0.3216
        0.9922    0.9922    0.9922    0.9922    0.7765    0.7137    0.9686    0.9451         0         0
        0.4196    0.9922    0.9922    0.8039    0.0431         0    0.1686    0.6039         0         0
        0.0039    0.6039    0.9922    0.3529         0         0         0         0         0         0
             0    0.5451    0.9922    0.7451    0.0078         0         0         0         0         0
             0    0.0431    0.7451    0.9922    0.2745         0         0         0         0         0
             0         0    0.1373    0.9451    0.8824    0.6275    0.4235    0.0039         0         0
             0         0         0    0.3176    0.9412    0.9922    0.9922    0.4667    0.0980         0
             0         0         0         0    0.1765    0.7294    0.9922    0.9922    0.5882    0.1059
             0         0         0         0         0    0.0627    0.3647    0.9882    0.9922    0.7333
             0         0         0         0         0         0         0    0.9765    0.9922    0.9765
             0         0         0         0    0.1804    0.5098    0.7176    0.9922    0.9922    0.8118
             0         0    0.1529    0.5804    0.8980    0.9922    0.9922    0.9922    0.9804    0.7137
        0.0941    0.4471    0.8667    0.9922    0.9922    0.9922    0.9922    0.7882    0.3059         0
        0.8353    0.9922    0.9922    0.9922    0.9922    0.7765    0.3176    0.0078         0         0
        0.9922    0.9922    0.9922    0.7647    0.3137    0.0353         0         0         0         0
        0.9922    0.9569    0.5216    0.0431         0         0         0         0         0         0
        0.5176    0.0627         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
    
      Columns 21 through 28
    
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
        0.6510    1.0000    0.9686    0.4980         0         0         0         0
        0.9922    0.9490    0.7647    0.2510         0         0         0         0
        0.3216    0.2196    0.1529         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
        0.2510         0         0         0         0         0         0         0
        0.0078         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
    

    My own dataset:

      Columns 1 through 10
    
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0    0.4000    0.5569
             0         0         0         0         0         0         0         0    0.9961    0.9922
             0         0         0         0         0         0         0         0    0.6745    0.9882
             0         0         0         0         0         0         0         0    0.0824    0.8745
             0         0         0         0         0         0         0         0         0    0.4784
             0         0         0         0         0         0         0         0         0    0.4824
             0         0         0         0         0         0         0         0    0.0824    0.8745
             0         0         0         0         0         0         0    0.0824    0.8392    0.9922
             0         0         0         0         0         0         0    0.2392    0.9922    0.6706
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0    0.4431    0.3608
             0         0         0         0         0         0         0    0.3216    0.9922    0.5922
             0         0         0         0         0         0         0    0.3216    1.0000    0.9922
             0         0         0         0         0         0         0         0    0.2784    0.5922
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
    
      Columns 11 through 20
    
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0    0.2000    0.5176    0.8392    0.9922    0.9961    0.9922    0.7961    0.6353
        0.7961    0.7961    0.9922    0.9882    0.9922    0.9882    0.5922    0.2745         0         0
        0.9569    0.7961    0.5569    0.4000    0.3216         0         0         0         0         0
        0.7961         0         0         0         0         0         0         0         0         0
        0.9176    0.1176         0         0         0         0         0         0         0         0
        0.9922    0.1961         0         0         0         0         0         0         0         0
        0.9961    0.3569    0.2000    0.2000    0.2000    0.0392         0         0         0         0
        0.9922    0.9882    0.9922    0.9882    0.9922    0.6745    0.3216         0         0         0
        0.7961    0.6353    0.4000    0.4000    0.7961    0.8745    0.9961    0.9922    0.2000    0.0392
             0         0         0         0         0    0.0784    0.4392    0.7529    0.9922    0.8314
             0         0         0         0         0         0         0         0    0.4000    0.7961
             0         0         0         0         0         0         0         0         0    0.0784
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0    0.0824    0.4000    0.4000    0.7176
        0.9176    0.5961    0.6000    0.7569    0.6784    0.9922    0.9961    0.9922    0.9961    0.8353
        0.5922    0.9098    0.9922    0.8314    0.7529    0.5922    0.5137    0.1961    0.1961    0.0392
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0         0         0
    
      Columns 21 through 28
    
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
        0.1608         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
        0.1608         0         0         0         0         0         0         0
        0.9176    0.2000         0         0         0         0         0         0
        0.8353    0.9098    0.3216         0         0         0         0         0
        0.2431    0.7961    0.9176    0.4392         0         0         0         0
             0    0.0784    0.8353    0.9882         0         0         0         0
             0         0    0.6000    0.9922         0         0         0         0
             0    0.1608    0.9137    0.8314         0         0         0         0
        0.1216    0.6784    0.9569    0.1569         0         0         0         0
        0.9137    0.8314    0.3176         0         0         0         0         0
        0.5569    0.0784         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
             0         0         0         0         0         0         0         0
    
  • yasin.yazici
    yasin.yazici over 9 years
    I've collected 120 digits from 4 subject by using Samsung Note3 with its pen. So pen thickness didn't differ in general. Your suggestion will probably increase accuracy because MNIST data has various type of digits and not similar to my own dataset in general. However, my ultimate aim is detection of handwritten letters and digits accurately from any image. It can be an image of blackboard in a class. In that case, CNN or NN trained with MNIST or dataset colected via a smartphone pen will yield lower accuracy again. I need a general pre-process that can increase resemblance with training set.
  • yasin.yazici
    yasin.yazici over 9 years
    That is the closest answer to my question. I will check that out.