Keras Classification - Object Detection

12,291

the machine learning model you built and the task you are trying to achieve are not the same. the model tries to solve a classification task while your goal is to detect an object inside the image, which is an object detection task.

classification has a boolean question while detection quesion has more than two answers answers.

What can you do?

I can suggest you three possibilities to try:


1. use sliding window combined with your model

crop boxes of defined sizes (e.g. from 20X20 to 160X160) and use sliding window. for each window, try to predict the probability its a dog and finally take the maximum window you predicted on.

this will generate multiple candidates for the bounding box and you will choose the bounding box using the highest probability you got.

this might be slow as we need to predict on hundreds+ samples.

another option is to try implement RCNN (another link) or Faster-RCNN network on top of your network. These networks are basically reducing the number of bounding box windows candidates to use.

Update - computing sliding window example

the following code demonstrate how to do the sliding window algorithm. you can change the parameters.

import random
import numpy as np

WINDOW_SIZES = [i for i in range(20, 160, 20)]


def get_best_bounding_box(img, predict_fn, step=10, window_sizes=WINDOW_SIZES):
    best_box = None
    best_box_prob = -np.inf

    # loop window sizes: 20x20, 30x30, 40x40...160x160
    for win_size in window_sizes:
        for top in range(0, img.shape[0] - win_size + 1, step):
            for left in range(0, img.shape[1] - win_size + 1, step):
                # compute the (top, left, bottom, right) of the bounding box
                box = (top, left, top + win_size, left + win_size)

                # crop the original image
                cropped_img = img[box[0]:box[2], box[1]:box[3]]

                # predict how likely this cropped image is dog and if higher
                # than best save it
                print('predicting for box %r' % (box, ))
                box_prob = predict_fn(cropped_img)
                if box_prob > best_box_prob:
                    best_box = box
                    best_box_prob = box_prob

    return best_box


def predict_function(x):
    # example of prediction function for simplicity, you
    # should probably use `return model.predict(x)`
    random.seed(x[0][0])
    return random.random()


# dummy array of 256X256
img = np.arange(256 * 256).reshape((256, 256))

best_box = get_best_bounding_box(img, predict_function)
print('best bounding box %r' % (best_box, ))

example output:

predicting for box (0, 0, 20, 20)
predicting for box (0, 10, 20, 30)
predicting for box (0, 20, 20, 40)
...
predicting for box (110, 100, 250, 240)
predicting for box (110, 110, 250, 250)
best bounding box (140, 80, 160, 100)


2. train new network for object detection task

you can take a look at the pascal dataset (examples here) which contains 20 classes and two of them are cats and dogs.

the dataset contains the location of the objects as the Y target.


3. use existing network for this task

last but not least, you can reuse existing network or even do "knowledge transfer" (keras example here) for your specific task.

take a look at the following convnets-keras lib.

so choose your best method to go and update us with the results.

Share:
12,291

Related videos on Youtube

Powisss
Author by

Powisss

Updated on October 11, 2022

Comments

  • Powisss
    Powisss over 1 year

    I am working on a classification then object detection with Keras and Python. I have classified cats/dogs with 80%+ accuracy, Im ok with the current result for now. My question is how do I detect cat or dog from an input image? I'm completely confused. I want to use my own heights and not pretrained ones from internet.

    Here is my code currently:

    from keras.preprocessing.image import ImageDataGenerator
    from keras.models import Sequential
    from keras.layers import Convolution2D, MaxPooling2D
    from keras.layers import Activation, Dropout, Flatten, Dense
    import numpy as np
    import matplotlib.pyplot as plt
    import matplotlib
    
    from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
    
    #########################################################################################################
    #VALUES
    # dimensions of our images.
    img_width, img_height = 150, 150
    
    train_data_dir = 'data/train'
    validation_data_dir = 'data/validation'
    nb_train_samples = 2000 #1000 cats/dogs
    nb_validation_samples = 800 #400cats/dogs
    nb_epoch = 50
    #########################################################################################################
    
    #MODEL
    model = Sequential()
    model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    model.add(Convolution2D(32, 3, 3))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    model.add(Convolution2D(64, 3, 3))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    model.add(Flatten())
    model.add(Dense(64))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    
    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    
    
    # this is the augmentation configuration we will use for training
    train_datagen = ImageDataGenerator(
            rescale=1./255,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True)
    ##########################################################################################################
    #TEST AUGMENTATION
    img = load_img('data/train/cats/cat.0.jpg')  # this is a PIL image
    x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
    x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)
    
    # the .flow() command below generates batches of randomly transformed images
    # and saves the results to the `preview/` directory
    i = 0
    for batch in train_datagen.flow(x, batch_size=1,
                              save_to_dir='data/TEST AUGMENTATION', save_prefix='cat', save_format='jpeg'):
        i += 1
        if i > 20:
            break  # otherwise the generator would loop indefinitely
    ##########################################################################################################
    # this is the augmentation configuration we will use for testing:
    # only rescaling
    test_datagen = ImageDataGenerator(rescale=1./255)
    
    #PREPARE TRAINING DATA
    train_generator = train_datagen.flow_from_directory(
            train_data_dir, #data/train
            target_size=(img_width, img_height),  #RESIZE to 150/150
            batch_size=32,
            class_mode='binary')  #since we are using binarycrosentropy need binary labels
    
    #PREPARE VALIDATION DATA
    validation_generator = test_datagen.flow_from_directory(
            validation_data_dir,  #data/validation
            target_size=(img_width, img_height), #RESIZE 150/150
            batch_size=32,
            class_mode='binary')
    
    
    #START model.fit
    history =model.fit_generator(
            train_generator, #train data
            samples_per_epoch=nb_train_samples,
            nb_epoch=nb_epoch,
            validation_data=validation_generator,  #validation data
            nb_val_samples=nb_validation_samples)
    
    
    model.save_weights('savedweights.h5')
    # list all data in history
    print(history.history.keys())
    
    #ACC VS VAL_ACC
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy ACC VS VAL_ACC')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    # summarize history for loss
    #LOSS VS VAL_LOSS
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss LOSS vs VAL_LOSS')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    
    
    model.load_weights('first_try.h5')
    

    So now since i classified cat and dog, how and what do I need to do to input an image and go through it to find cat or a dog in it with a bounding box? I'm completely new to this nd not even sure if I'm tackling this in a correct way? Thank you.

    UPDATE Hi, Sorry to post results so late, was unable to work on this for few days. I am importing an image and reshaping it to 1,3,150,150 shape as 150,150 shape brings error:

    Exception: Error when checking : expected convolution2d_input_1 to have 4 dimensions, but got array with shape (150L, 150L)
    

    Importing image:

    #load test image
    img=load_img('data/prediction/cat.155.jpg')
    #reshape to 1,3,150,150
    img = np.arange(1* 150 * 150).reshape((1,3,150, 150))
    #check shape
    print(img.shape)
    

    Then I have changed def predict_function(x) to:

    def predict_function(x):
        # example of prediction function for simplicity, you
        # should probably use `return model.predict(x)`
       # random.seed(x[0][0])
      #  return random.random()
       return model.predict(img)
    

    Now when I run:

    best_box = get_best_bounding_box(img, predict_function)
    print('best bounding box %r' % (best_box, ))
    

    I get output as best bounding box: None

    So I ran just:

    model.predict(img)
    

    And get the following out:

    model.predict(img)
    Out[54]: array([[ 0.]], dtype=float32)
    

    So it is not checking at all if its a cat or a dog... Any ideas?

    NOTE: when def predict)function(x) is using:

    random.seed(x[0][0])
       return random.random()
    

    I do get the output as , it check boxes and gives the best one.